r - plyr を介してネストされたモデルのリストから係数を抽出する

Question

係数を抽出したいモデルのネストされたリストがあり、各行にそのモデルが格納されているリスト要素の名前も含まれるデータフレームを作成します。ネストされたリストを既に処理する plyr 関数があるのか、それとも一般的にタスクを達成するためのよりクリーンな方法があるのか疑問に思っていました。

例えば：

### Create nested list of models

iris.models <- list()
for (species in unique(iris$Species)) {

iris.models[[species]]<- list()

for (m in c("Sepal.Length","Sepal.Width","Petal.Length")) {

    iris.formula <- formula(paste("Petal.Width ~ ", m))
    iris.models[[species]][[m]] <- lm(iris.formula
                                      , data=iris
                                      , subset=Species==species)

    } # for m

} # for species 

### Create data frame of variable coefficients (excluding intercept)

irisCoefs <- ldply(names(iris.models)
             , function(sp) {
              ldply(iris.models[[sp]]
                    , function(meas) data.frame(Species=sp, Coef=coef(meas)[-1])
            )})
colnames(irisCoefs)[colnames(irisCoefs)==".id"] <- "Measure"
irisCoefs

このコードは、次のようなデータフレームを生成します。

  Measure      Species          Coef
1 Sepal.Length setosa     0.08314444
2 Sepal.Width  setosa     0.06470856
3 Petal.Length setosa     0.20124509
4 Sepal.Length versicolor 0.20935719
5 Sepal.Width  versicolor 0.41844560
6 Petal.Length versicolor 0.33105360
7 Sepal.Length virginica  0.12141646
8 Sepal.Width  virginica  0.45794906
9 Petal.Length virginica  0.16029696

私のコードは機能しますが、最終的に行った方法は少し洗練されていないように思えます。これをさらに単純化 (または他の場合に一般化) できるかどうか疑問に思っています。

私の問題は次のとおりです。

ネストされたリストを操作するのは少し難しいようです。外側の ldply 呼び出しでは、リストアイテムの名前を使用する必要がありましたが、内側の呼び出しでは .id 列を「無料で」追加しました。呼び出された関数内のリスト要素の名前にアクセスする簡単な方法がわかりませんでした。

また、2番目のldply関数呼び出し自体で、列名を「.id」から変更することもできませんでした。そのため、後で colnames ステートメントを追加することになりました。

plyr のやり方でコードをより簡単にする方法はありますか?

これが私の意図を明確にするのに役立つかどうかはわかりませんが、コードは次のようになると想像しました。

ldply(iris.models, .id.col="Species", function(sp) ldply(sp, .id.col="Measure", function(x) data.frame(coef(x)[-1])))

ありがとう。

score 0 · Accepted Answer

正確に必要な形式ではありませんが、これは基本関数で機能します。

m=c("Sepal.Length","Sepal.Width","Petal.Length")
do.call(rbind,
    by(iris,iris$Species,
      function(x) sapply(m, 
        function(y) coef(lm(paste('Petal.Width ~',y),data=x))) [2,]
   ) 
)

           Sepal.Length Sepal.Width Petal.Length
setosa       0.08314444  0.06470856    0.2012451
versicolor   0.20935719  0.41844560    0.3310536
virginica    0.12141646  0.45794906    0.1602970

score 0 · Accepted Answer

プライヤーアプローチ：

#Melt the predictor variables
iris_m <- melt(iris[, -4], id.vars = "Species")
#Insert the dependant variable
iris_m$Petal.Width <- rep(iris$Petal.Width, 3)

#Make the models divide by species and variable
models <- dlply(iris_m, .(Species, variable), 
                function(x) lm(Petal.Width ~ value, data = x))
#Get the coefficients as a nice data.frame
ldply(models, function(x) coef(x)[-1])

     Species     variable      value
1     setosa Sepal.Length 0.08314444
2     setosa  Sepal.Width 0.06470856
3     setosa Petal.Length 0.20124509
4 versicolor Sepal.Length 0.20935719
5 versicolor  Sepal.Width 0.41844560
6 versicolor Petal.Length 0.33105360
7  virginica Sepal.Length 0.12141646
8  virginica  Sepal.Width 0.45794906
9  virginica Petal.Length 0.16029696

r - plyr を介してネストされたモデルのリストから係数を抽出する

2 に答える 2

Related

Reference