r - LR モデルを別のデータフレームに適用する

Question

SO を検索しましたが、質問に該当する適切なコードが見つからなかったようです。この質問に似ています: 1 つのデータフレームで数回の線形回帰計算

Andrie のコードに従って、LR 係数のデータフレームを取得しました。

Cddply <- ddply(test, .(sumtest), function(test)coef(lm(Area~Conc, data=test))) 

sumtest (Intercept) Conc
1   -108589.2726    846.0713372
2   -49653.18701    811.3982918
3   -102598.6252    832.6419926
4   -72607.4017 727.0765558
5   54224.28878 391.256075
6   -42357.45407    357.0845661
7   -34171.92228    367.3962888
8   -9332.569856    289.8631555
9   -7376.448899    335.7047756
10  -37704.92277    359.1457617

私の質問は、これらの各 LR モデル (1-10) を別のデータフレームの特定の行間隔に適用して、独立変数 x を 3 列目に取得する方法です。たとえば、sumtest1 をサンプル 6:29 に、sumtest2 をサンプル 35:50 に、sumtest3 をサンプル 56:79 などに、24 および 16 サンプルの間隔で適用したいと思います。サンプル番号は 200 以降繰り返されるため、sumtest9 は再びサンプル 6:29 になります。

Sample  Area
6   236211
7   724919
8   1259814
9   1574722
10  268836
11  863818
12  1261768
13  1591845
14  220322
15  608396
16  980182
17  1415859
18  276276
19  724532
20  1130024
21  1147840
22  252051
23  544870
24  832512
25  899457
26  285093
27  4291007
28  825922
29  865491
35  246707
36  538092
37  767269
38  852410
39  269152
40  971471
41  1573989
42  1897208
43  261321
44  481486
45  598617
46  769240
47  229695
48  782691
49  1380597
50  1725419

結果のデータフレームは次のようになります。

Sample  Area    Calc
6   236211  407.5312917
7   724919  985.1525288
8   1259814 1617.363812
9   1574722 1989.564693
10  268836  446.0919309
...
35  246707  365.2452551
36  538092  724.3591324
37  767269  1006.805521
38  852410  1111.736505
39  269152  392.9073207

ご協力いただきありがとうございます。

score 0 · Accepted Answer

これは、あなたの望むことですか？試したときにコードがどのように機能するかを簡単に確認できるように、「領域」の少し大きなダミーデータセットを作成しました。

# create 400 rows of area data
set.seed(123)
df <- data.frame(area = round(rnorm(400, mean = 1000000, sd = 100000)))

# "sample numbers repeats after 200" -> add a sample nr 1-200, 1-200
df$sample_nr <- 1:200

# create a factor which cuts the vector of sample_nr into pieces of length 16, 24, 16, 24...
# repeat to a total length of the pieces is 200 
# i.e. 5 repeats of (16, 24)
grp <- cut(df$sample_nr, breaks = c(-Inf, cumsum(rep(c(16, 24), 5))))

# add a numeric version of the chunks to data frame
# this number indicates the model from which coefficients will be used
# row 1-16 (16 rows): model 1; row 17-40 (24 rows): model 2;
# row 41-56 (16 rows): model 3; and so on. 
df$mod <- as.numeric(grp)

# read coefficients
coefs <- read.table(text = "intercept beta_conc
1   -108589.2726    846.0713372
2   -49653.18701    811.3982918
3   -102598.6252    832.6419926
4   -72607.4017 727.0765558
5   54224.28878 391.256075
6   -42357.45407    357.0845661
7   -34171.92228    367.3962888
8   -9332.569856    289.8631555
9   -7376.448899    335.7047756
10  -37704.92277    359.1457617", header = TRUE)

# add model number
coefs$mod <- rownames(coefs)

head(df)
head(coefs)

# join area data and coefficients by model number
# (use 'join' instead of merge to avoid sorting)
library(plyr)
df2 <- join(df, coefs)

# calculate conc from area and model coefficients
# area = intercept + beta_conc * conc
# conc = (area - intercept) / beta_conc
df2$conc <- (df2$area - df2$intercept) / df2$beta_conc
head(df2, 41)

r - LR モデルを別のデータフレームに適用する

1 に答える 1

Related

Reference