r - R での線形回帰とグループ化

Question

関数を使用して R で線形回帰を実行したいと考えていlm()ます。私のデータは、1 つのフィールドが年 (22 年) で、もう 1 つのフィールドが州 (50 州) の年次時系列です。最後に lm 応答のベクトルが得られるように、各状態の回帰を当てはめたいと思います。各状態に対して for ループを実行し、ループ内で回帰を実行し、各回帰の結果をベクトルに追加することを想像できます。ただし、これは R のようには見えません。SAS では「by」ステートメントを実行し、SQL では「group by」を実行します。これを行うRの方法は何ですか？

score 65 · Accepted Answer

plyrパッケージを使用したアプローチは次のとおりです。

d <- data.frame(
  state = rep(c('NY', 'CA'), 10),
  year = rep(1:10, 2),
  response= rnorm(20)
)

library(plyr)
# Break up d by state, then fit the specified model to each piece and
# return a list
models <- dlply(d, "state", function(df) 
  lm(response ~ year, data = df))

# Apply coef to each model and return a data frame
ldply(models, coef)

# Print the summary of each model
l_ply(models, summary, .print = TRUE)

score 58 · Accepted Answer

lme4パッケージを使用する 1 つの方法を次に示します。

 library(lme4)
 d <- data.frame(state=rep(c('NY', 'CA'), c(10, 10)),
                 year=rep(1:10, 2),
                 response=c(rnorm(10), rnorm(10)))

 xyplot(response ~ year, groups=state, data=d, type='l')

 fits <- lmList(response ~ year | state, data=d)
 fits
#------------
Call: lmList(formula = response ~ year | state, data = d)
Coefficients:
   (Intercept)        year
CA -1.34420990  0.17139963
NY  0.00196176 -0.01852429

Degrees of freedom: 20 total; 16 residual
Residual standard error: 0.8201316

score 24 · Accepted Answer

私の意見では、混合線形モデルは、この種のデータに対するより良いアプローチです。以下のコードは、固定効果で全体的な傾向を示しています。変量効果は、個々の州の傾向が世界的な傾向とどのように異なるかを示しています。相関構造では、時間的自己相関が考慮されます。Pinheiro & Bates (S および S-Plus の Mixed Effects Models) をご覧ください。

library(nlme)
lme(response ~ year, random = ~year|state, correlation = corAR1(~year))

score 14 · Accepted Answer

purrr::mapこの問題へのアプローチを追加することは価値があると思います。

library(tidyverse)

d <- data.frame(state=rep(c('NY', 'CA'), c(10, 10)),
                                 year=rep(1:10, 2),
                                 response=c(rnorm(10), rnorm(10)))

d %>% 
  group_by(state) %>% 
  nest() %>% 
  mutate(model = map(data, ~lm(response ~ year, data = .)))

broomこれらの結果でパッケージを使用することに関するさらなるアイデアについては、@Paul Hiemstra の回答を参照してください。

score 9 · Accepted Answer

私の答えは少し遅れましたが、同様の機能を探していました。R の組み込み関数「by」もグループ化を簡単に行うことができるようです。

?by には、グループごとに適合し、sapply で係数を抽出する次の例が含まれています。

require(stats)
## now suppose we want to extract the coefficients by group 
tmp <- with(warpbreaks,
            by(warpbreaks, tension,
               function(x) lm(breaks ~ wool, data = x)))
sapply(tmp, coef)

score 8 · Accepted Answer

## make fake data
 ngroups <- 2
 group <- 1:ngroups
 nobs <- 100
 dta <- data.frame(group=rep(group,each=nobs),y=rnorm(nobs*ngroups),x=runif(nobs*ngroups))
 head(dta)
#--------------------
  group          y         x
1     1  0.6482007 0.5429575
2     1 -0.4637118 0.7052843
3     1 -0.5129840 0.7312955
4     1 -0.6612649 0.9028034
5     1 -0.5197448 0.1661308
6     1  0.4240346 0.8944253
#------------ 
## function to extract the results of one model
 foo <- function(z) {
   ## coef and se in a data frame
   mr <- data.frame(coef(summary(lm(y~x,data=z))))
   ## put row names (predictors/indep variables)
   mr$predictor <- rownames(mr)
   mr
 }
 ## see that it works
 foo(subset(dta,group==1))
#=========
              Estimate Std..Error   t.value  Pr...t..   predictor
(Intercept)  0.2176477  0.1919140  1.134090 0.2595235 (Intercept)
x           -0.3669890  0.3321875 -1.104765 0.2719666           x
#----------
## one option: use command by
 res <- by(dta,dta$group,foo)
 res
#=========
dta$group: 1
              Estimate Std..Error   t.value  Pr...t..   predictor
(Intercept)  0.2176477  0.1919140  1.134090 0.2595235 (Intercept)
x           -0.3669890  0.3321875 -1.104765 0.2719666           x
------------------------------------------------------------ 
dta$group: 2
               Estimate Std..Error    t.value  Pr...t..   predictor
(Intercept) -0.04039422  0.1682335 -0.2401081 0.8107480 (Intercept)
x            0.06286456  0.3020321  0.2081387 0.8355526           x

## using package plyr is better
 library(plyr)
 res <- ddply(dta,"group",foo)
 res
#----------
  group    Estimate Std..Error    t.value  Pr...t..   predictor
1     1  0.21764767  0.1919140  1.1340897 0.2595235 (Intercept)
2     1 -0.36698898  0.3321875 -1.1047647 0.2719666           x
3     2 -0.04039422  0.1682335 -0.2401081 0.8107480 (Intercept)
4     2  0.06286456  0.3020321  0.2081387 0.8355526           x

score 6 · Accepted Answer

上記のlm()関数は簡単な例です。ちなみに、あなたのデータベースには次のような列があると思います。

年の状態var1var2y .. ..

私の見解では、次のコードを使用できます。

require(base) 
library(base) 
attach(data) # data = your data base
             #state is your label for the states column
modell<-by(data, data$state, function(data) lm(y~I(1/var1)+I(1/var2)))
summary(modell)

r - R での線形回帰とグループ化

10 に答える 10

Related

Reference