r - クラスタ化された標準誤差と texreg による信頼区間?

Question

クラスター化された標準誤差でモデルを実行したときに Stata が生成する 95% CI を再現しようとしています。例えば：

    regress api00 acs_k3 acs_46 full enroll, cluster(dnum)

    Regression with robust standard errors                 Number of obs =     395
                                                           F(  4,    36) =   31.18
                                                           Prob > F      =  0.0000
                                                           R-squared     =  0.3849
    Number of clusters (dnum) = 37                         Root MSE      =  112.20

    ------------------------------------------------------------------------------
             |               Robust
       api00 |      Coef.   Std. Err.       t     P>|t|       [95% Conf. Interval]
    ---------+--------------------------------------------------------------------
      acs_k3 |   6.954381   6.901117      1.008   0.320      -7.041734     20.9505
      acs_46 |   5.966015   2.531075      2.357   0.024       .8327565    11.09927
        full |   4.668221   .7034641      6.636   0.000        3.24153    6.094913
      enroll |  -.1059909   .0429478     -2.468   0.018      -.1930931   -.0188888
       _cons |  -5.200407   121.7856     -0.043   0.966       -252.193    241.7922
    ------------------------------------------------------------------------------

係数と標準誤差を再現できます。

    library(readstata13)
    library(texreg)
    library(sandwich)
    library(lmtest)

    clustered.se <- function(model_result, data, cluster) {
      model_variables   <-
      intersect(colnames(data), c(colnames(model_result$model), cluster))
      model_rows <- rownames(model_result$model)
      data <- data[model_rows, model_variables]
      cl <- data[[cluster]]
      M <- length(unique(cl))
      N <- nrow(data)
      K <- model_result$rank
      dfc <- (M / (M - 1)) * ((N - 1) / (N - K))
      uj  <-
      apply(estfun(model_result), 2, function(x)
      tapply(x, cl, sum))
      vcovCL <- dfc * sandwich(model_result, meat = crossprod(uj) / N)
      standard.errors <- coeftest(model_result, vcov. = vcovCL)[, 2]
      p.values <- coeftest(model_result, vcov. = vcovCL)[, 4]
      clustered.se <-
      list(vcovCL = vcovCL,
      standard.errors = standard.errors,
      p.values = p.values)
      return(clustered.se)
      }

    elemapi2 <- read.dta13(file = 'elemapi2.dta')

    lm1 <-
      lm(formula = api00 ~ acs_k3 + acs_46 + full + enroll,
      data = elemapi2)

    clustered_se <-
      clustered.se(model_result = lm1,
      data = elemapi2,
      cluster = "dnum")

    htmlreg(
      lm1,
      override.se = clustered_se$standard.errors,
      override.p = clustered_se$p.value,
      star.symbol = "\\*",
      digits = 7
      )   

    =============================
                 Model 1         
    -----------------------------
    (Intercept)    -5.2004067    
                 (121.7855938)   
    acs_k3          6.9543811    
                   (6.9011174)   
    acs_46          5.9660147 *  
                   (2.5310751)   
    full            4.6682211 ***
                   (0.7034641)   
    enroll         -0.1059909 *  
                   (0.0429478)   
    -----------------------------
    R^2             0.3848830    
    Adj. R^2        0.3785741    
    Num. obs.     395            
    RMSE          112.1983218    
    =============================
    *** p < 0.001, ** p < 0.01, * p < 0.05

残念ながら、95% の信頼区間を再現できません。

    screenreg(
      lm1,
      override.se = clustered_se$standard.errors,
      override.p = clustered_se$p.value,
      digits = 7,
      ci.force = TRUE
      )     

    ========================================
                 Model 1                    
    ----------------------------------------
    (Intercept)     -5.2004067              
                 [-243.8957845; 233.4949710]
    acs_k3           6.9543811              
                 [  -6.5715605;  20.4803228]
    acs_46           5.9660147 *            
                 [   1.0051987;  10.9268307]
    full             4.6682211 *            
                 [   3.2894567;   6.0469855]
    enroll          -0.1059909 *            
                 [  -0.1901670;  -0.0218148]
    ----------------------------------------
    R^2              0.3848830              
    Adj. R^2         0.3785741              
    Num. obs.      395                      
    RMSE           112.1983218              
    ========================================
    * 0 outside the confidence interval

「手で」行うと、次の場合と同じ結果が得られtexregます。

    level <- 0.95
    a <- 1-(1 - level)/2
    coeff <- lm1$coefficients
    se <- clustered_se$standard.errors
    lb <- coeff - qnorm(a)*se
    ub <- coeff + qnorm(a)*se

    > lb
    (Intercept)      acs_k3      acs_46        full      enroll 
    -243.895784   -6.571560    1.005199    3.289457   -0.190167 

    > ub
     (Intercept)       acs_k3       acs_46         full       enroll 
    233.49497100  20.48032276  10.92683074   6.04698550  -0.02181481

Stata は何をしていて、どうすれば R で再現できますか?

PS: これはフォローアップの質問です。PS2: Stata データはこちらから入手できます。

score 2 · Accepted Answer

Stata は、Z (通常のエラー) ではなく t(36) に基づく信頼区間を使用しているようです。

Stata 出力からの値の取得

coef=6.954381; rse=  6.901117 ; lwr= -7.041734; upr= 20.9505
(upr-coef)/rse
## [1] 2.028095
(lwr-coef)/rse
## [1] -2.028094

t(36) のテール値の計算/クロスチェック:

pt(2.028094,36) 
## [1] 0.975
qt(0.975,36)
## [1] 2.028094

信頼区間を texreg に渡す方法がわかりません。あなたは再現可能な例を与えていないので（私は持っていませんelemapi2.dta）、どのようにdfを取得するか正確には言えませんが、あなたが望むように見えますtdf <- length(unique(elemapi2$dnum))-1

level <- 0.95
a <- 1- (1 - level)/2
bounds <- coef(lm1) + c(-1,1)*clustered_se*qt(a,tdf)

score 0 · Accepted Answer

実際、Stata は正規分布ではなく t 分布を使用しています。CRAN からインストールできるパッケージtexregを使用lm_robustして、Stata に一致する信頼区間を取得するための非常に簡単なソリューションが提供されました。estimatrinstall.packages(estimatr)

> library(estimatr)
> lmro <- lm_robust(mpg ~ hp, data = mtcars, clusters = cyl, se_type = "stata")
> screenreg(lmro)

===========================
             Model 1       
---------------------------
(Intercept)   30.10 *      
             [13.48; 46.72]
hp            -0.07        
             [-0.15;  0.01]
---------------------------
R^2            0.60        
Adj. R^2       0.59        
Num. obs.     32           
RMSE           3.86        
===========================
* 0 outside the confidence interval

r - クラスタ化された標準誤差と texreg による信頼区間?

2 に答える 2

Related

Reference