クラスター化された標準誤差でモデルを実行したときに Stata が生成する 95% CI を再現しようとしています。例えば:
regress api00 acs_k3 acs_46 full enroll, cluster(dnum)
Regression with robust standard errors Number of obs = 395
F( 4, 36) = 31.18
Prob > F = 0.0000
R-squared = 0.3849
Number of clusters (dnum) = 37 Root MSE = 112.20
------------------------------------------------------------------------------
| Robust
api00 | Coef. Std. Err. t P>|t| [95% Conf. Interval]
---------+--------------------------------------------------------------------
acs_k3 | 6.954381 6.901117 1.008 0.320 -7.041734 20.9505
acs_46 | 5.966015 2.531075 2.357 0.024 .8327565 11.09927
full | 4.668221 .7034641 6.636 0.000 3.24153 6.094913
enroll | -.1059909 .0429478 -2.468 0.018 -.1930931 -.0188888
_cons | -5.200407 121.7856 -0.043 0.966 -252.193 241.7922
------------------------------------------------------------------------------
係数と標準誤差を再現できます。
library(readstata13)
library(texreg)
library(sandwich)
library(lmtest)
clustered.se <- function(model_result, data, cluster) {
model_variables <-
intersect(colnames(data), c(colnames(model_result$model), cluster))
model_rows <- rownames(model_result$model)
data <- data[model_rows, model_variables]
cl <- data[[cluster]]
M <- length(unique(cl))
N <- nrow(data)
K <- model_result$rank
dfc <- (M / (M - 1)) * ((N - 1) / (N - K))
uj <-
apply(estfun(model_result), 2, function(x)
tapply(x, cl, sum))
vcovCL <- dfc * sandwich(model_result, meat = crossprod(uj) / N)
standard.errors <- coeftest(model_result, vcov. = vcovCL)[, 2]
p.values <- coeftest(model_result, vcov. = vcovCL)[, 4]
clustered.se <-
list(vcovCL = vcovCL,
standard.errors = standard.errors,
p.values = p.values)
return(clustered.se)
}
elemapi2 <- read.dta13(file = 'elemapi2.dta')
lm1 <-
lm(formula = api00 ~ acs_k3 + acs_46 + full + enroll,
data = elemapi2)
clustered_se <-
clustered.se(model_result = lm1,
data = elemapi2,
cluster = "dnum")
htmlreg(
lm1,
override.se = clustered_se$standard.errors,
override.p = clustered_se$p.value,
star.symbol = "\\*",
digits = 7
)
=============================
Model 1
-----------------------------
(Intercept) -5.2004067
(121.7855938)
acs_k3 6.9543811
(6.9011174)
acs_46 5.9660147 *
(2.5310751)
full 4.6682211 ***
(0.7034641)
enroll -0.1059909 *
(0.0429478)
-----------------------------
R^2 0.3848830
Adj. R^2 0.3785741
Num. obs. 395
RMSE 112.1983218
=============================
*** p < 0.001, ** p < 0.01, * p < 0.05
残念ながら、95% の信頼区間を再現できません。
screenreg(
lm1,
override.se = clustered_se$standard.errors,
override.p = clustered_se$p.value,
digits = 7,
ci.force = TRUE
)
========================================
Model 1
----------------------------------------
(Intercept) -5.2004067
[-243.8957845; 233.4949710]
acs_k3 6.9543811
[ -6.5715605; 20.4803228]
acs_46 5.9660147 *
[ 1.0051987; 10.9268307]
full 4.6682211 *
[ 3.2894567; 6.0469855]
enroll -0.1059909 *
[ -0.1901670; -0.0218148]
----------------------------------------
R^2 0.3848830
Adj. R^2 0.3785741
Num. obs. 395
RMSE 112.1983218
========================================
* 0 outside the confidence interval
「手で」行うと、次の場合と同じ結果が得られtexreg
ます。
level <- 0.95
a <- 1-(1 - level)/2
coeff <- lm1$coefficients
se <- clustered_se$standard.errors
lb <- coeff - qnorm(a)*se
ub <- coeff + qnorm(a)*se
> lb
(Intercept) acs_k3 acs_46 full enroll
-243.895784 -6.571560 1.005199 3.289457 -0.190167
> ub
(Intercept) acs_k3 acs_46 full enroll
233.49497100 20.48032276 10.92683074 6.04698550 -0.02181481
Stata は何をしていて、どうすれば R で再現できますか?
PS: これはフォローアップの質問です。PS2: Stata データはこちらから入手できます。