r - ldply を使用して長さの異なる出力を処理する

Question

plyr パッケージの ldply を使用して異なる長さの出力を処理する方法についての簡単な質問です。これは、私が使用しているコードと私が得ているエラーの単純なバージョンです:

# function to collect the coefficients from the regression models:
> SecreatWeapon <- dlply(merged1,~country.x, function(df) {
+     lm(log(child_mortality) ~ log(IHME_usd_gdppc)+ hiv_prev,data=df)
+ })
> 
# functions to extract the output of interest
> extract.coefs <- function(mod) c(extract.coefs = summary(mod)$coefficients[,1])
> extract.se.coefs <- function(mod) c(extract.se.coefs = summary(mod)$coefficients[,2])
> 
# function to combine the extracted output
> res <- ldply(SecreatWeapon, extract.coefs)
Error in list_to_dataframe(res, attr(.data, "split_labels")) : 
 Results do not have equal lengths

ここでのエラーは、一部のモデルに NA 値が含まれるため、次のようになるためです。

> SecreatWeapon[[1]]

Call:
lm(formula = log(child_mortality) ~ log(IHME_usd_gdppc) + hiv_prev, 
    data = df)

Coefficients:
       (Intercept)  log(IHME_usd_gdppc)             hiv_prev  
           -4.6811               0.5195                   NA

したがって、次の出力は同じ長さになりません。例えば：

> summary(SecreatWeapon[[1]])$coefficients
                  Estimate Std. Error   t value     Pr(>|t|)
(Intercept)         -4.6811000  0.6954918 -6.730633 6.494799e-08
log(IHME_usd_gdppc)  0.5194643  0.1224292  4.242977 1.417349e-04

しかし、私が得る他のもののために

> summary(SecreatWeapon[[10]])$coefficients
                   Estimate  Std. Error    t value     Pr(>|t|)
(Intercept)           18.612698   1.7505236  10.632646 1.176347e-12
log(IHME_usd_gdppc)   -2.256465   0.1773498 -12.723244 6.919009e-15
hiv_prev            -272.558951 160.3704493  -1.699558 9.784053e-02

簡単な修正はありますか？どうもありがとうございました、

アントニオ・ペドロ。

score 2 · Accepted Answer

summary.lm( . )でアクセスした関数は、NA の「係数」を持つ任意の lm オブジェクトの引数を使用した場合と$coefficientsは異なる出力を提供します。次のようなものを使用して満足しますか。coeflm

coef.se <- function(mod) {
      extract.coefs <- function(mod) coef(mod) # lengths all the same
      extract.se.coefs <- function(mod) { summary(mod)$coefficients[,2]}
return( merge( extract.coefs(mod), extract.se.coefs(mod), by='row.names', all=TRUE) ) 
             }

ローランドの例では、次のようになります。

> coef.se(fit)
    Row.names          x         y
1 (Intercept) -0.3606557 0.1602034
2          x1  2.2131148 0.1419714
3          x2         NA        NA

xcoefと y の名前を変更できますse.coef

score 1 · Accepted Answer

y <- c(1,2,3)
x1 <- c(0.6,1.1,1.5)
x2 <- c(1,1,1)
fit <- lm(y~x1+x2)

summary(fit)$coef
#              Estimate Std. Error   t value   Pr(>|t|)
#(Intercept) -0.3606557  0.1602034 -2.251236 0.26612016
#x1           2.2131148  0.1419714 15.588457 0.04078329

#function for full matrix, adjusted from getAnywhere(print.summary.lm)
full_coeffs <- function (fit) {
     fit_sum <- summary(fit)    
     cn <- names(fit_sum$aliased)
     coefs <- matrix(NA, length(fit_sum$aliased), 4, 
                     dimnames = list(cn, colnames(fit_sum$coefficients)))
     coefs[!fit_sum$aliased, ] <- fit_sum$coefficients
     coefs
}

full_coeffs(fit)
#              Estimate Std. Error   t value   Pr(>|t|)
#(Intercept) -0.3606557  0.1602034 -2.251236 0.26612016
#x1           2.2131148  0.1419714 15.588457 0.04078329
#x2                  NA         NA        NA         NA

r - ldply を使用して長さの異なる出力を処理する

2 に答える 2

Related

Reference