r - 行列で、列 1 の 20 ～ 30 パーセンタイル値に関連付けられた列 4 の値の平均を求めます

Question

基本的に、感度分析用のスパイダープロットを作成したいと考えています。データを 10 個のトランシェに分割し、各トランシェの結果の平均値 (4 列目) を見つけたいと考えています。トランシェは、各変数列のデータの 10、20、30、40 などのパーセンタイルに基づいて選択する必要があります。これでうまくいきましたが、もっと簡単な方法があるに違いないと考えています。

私のコード:

##Make some data and put it into a matrix.

c <- 1000
v1 <- rnorm (c, 100, 15)
v2 <- rnorm (c, 80, 10)
v3 <- rnorm (c, 50, 5)
r1 <- ((v1*v2^2)/v3)
data <- cbind (v1,v2)
data <- cbind (data, v3)
data <- cbind (data, r1)

##Sort matrix by first column.
data <- as.matrix(data[order(data[,1]),])

##Find mean of column 4 values corresponding to the smallest 10% (and 20%, and 30%,     etc.) of column 1 values.
a1 <- mean (data[1:(c/10),4])
a2 <- mean (data[(c/10):(2*c/10),4])
a3 <- mean (data[(2*c/10):(3*c/10),4])
a4 <- mean (data[(3*c/10):(4*c/10),4])
a5 <- mean (data[(4*c/10):(5*c/10),4])
a6 <- mean (data[(5*c/10):(6*c/10),4])
a7 <- mean (data[(6*c/10):(7*c/10),4])
a8 <- mean (data[(7*c/10):(8*c/10),4])
a9 <- mean (data[(8*c/10):(9*c/10),4])
a10 <- mean (data[(9*c/10):c,4])

##Combine into a vector.
a <- as.vector(c(a1, a2, a3, a4, a5, a6, a7, a8, a9, a10))

##Repeat for data sorted by columns 2 and 3 respectively.
data <- as.matrix(data[order(data[,2]),])

a1 <- mean (data[1:(c/10),4])
a2 <- mean (data[(c/10):(2*c/10),4])
a3 <- mean (data[(2*c/10):(3*c/10),4])
a4 <- mean (data[(3*c/10):(4*c/10),4])
a5 <- mean (data[(4*c/10):(5*c/10),4])
a6 <- mean (data[(5*c/10):(6*c/10),4])
a7 <- mean (data[(6*c/10):(7*c/10),4])
a8 <- mean (data[(7*c/10):(8*c/10),4])
a9 <- mean (data[(8*c/10):(9*c/10),4])
a10 <- mean (data[(9*c/10):c,4])

b <- as.vector(c(a1, a2, a3, a4, a5, a6, a7, a8, a9, a10))

data <- as.matrix(data[order(data[,3]),])

a1 <- mean (data[1:(c/10),4])
a2 <- mean (data[(c/10):(2*c/10),4])
a3 <- mean (data[(2*c/10):(3*c/10),4])
a4 <- mean (data[(3*c/10):(4*c/10),4])
a5 <- mean (data[(4*c/10):(5*c/10),4])
a6 <- mean (data[(5*c/10):(6*c/10),4])
a7 <- mean (data[(6*c/10):(7*c/10),4])
a8 <- mean (data[(7*c/10):(8*c/10),4])
a9 <- mean (data[(8*c/10):(9*c/10),4])
a10 <- mean (data[(9*c/10):c,4])

d <- as.vector(c(a1, a2, a3, a4, a5, a6, a7, a8, a9, a10))

##Make a pretty chart
plot (a, type = "o", col = "red") 
lines (b, type = "o", col = "blue") 
lines (d, type = "o", col = "green")

score 2 · Accepted Answer

同じことを行うコードを次に示しますが、よりコンパクトかつ慣用的に R 用にしています。

n <- 1000
# changed from c to n since you use c again later as something else
v1 <- rnorm (n, 100, 15)
v2 <- rnorm (n, 80, 10)
v3 <- rnorm (n, 50, 5)
r1 <- ((v1*v2^2)/v3)

DF <- data.frame(v1, v2, v3, r1)
# A data.frame seems like it would be a better fit for this

library("Hmisc")
# The Hmisc package has a function which splits in to quantiles, so use it
DF <- transform(DF, 
                v1.decile = cut2(v1, g=10),
                v2.decile = cut2(v2, g=10),
                v3.decile = cut2(v3, g=10))
# add three new variables to the data frame which indicate which decile each
# value belongs to, for each of v1, v2, and v3
a <- aggregate(DF$r1, list(DF$v1.decile), mean)$x
# why add the new variables? because aggregate can perform an operation on
# groups of one variable defined by the value of another variable
b <- aggregate(DF$r1, list(DF$v2.decile), mean)$x
c <- aggregate(DF$r1, list(DF$v3.decile), mean)$x

その後、以前と同じようにプロットを作成できます。

編集：

Ananda Mahto's answer は、私が忘れていた集計関数の関数バージョンを指摘しました。aggregate次のように行をより明確に書くことができます

a <- aggregate(r1 ~ v1.decile, DF, mean)$r1
b <- aggregate(r1 ~ v2.decile, DF, mean)$r1
c <- aggregate(r1 ~ v3.decile, DF, mean)$r1

score 1 · Accepted Answer

data.frameこれは、Brian Diggs の回答と概念的に非常によく似ていますが、入力がパッケージであるかロードされているかに依存しません。matplotまた、各列を一度に 1 つずつプロットしなくてもプロットが得られるも導入されています。

これがあなたのデータです：

set.seed(1) # make it reproducible 
n <- 1000
v1 <- rnorm (c, 100, 15)
v2 <- rnorm (c, 80, 10)
v3 <- rnorm (c, 50, 5)
r1 <- ((v1*v2^2)/v3)
data <- cbind (v1, v2, v3, r1)
rm(v1, v2, v3, r1) # Cleanup

head(data)
#             v1       v2       v3        r1
# [1,]  90.60319 95.11781 54.59489 15014.651
# [2,] 102.75465 83.89843 53.91068 13416.349
# [3,]  87.46557 73.78759 50.37282  9453.824
# [4,] 123.92921 57.85300 40.05324 10355.899
# [5,] 104.94262 91.24931 53.09913 16455.977
# [6,]  87.69297 79.55066 49.71936 11161.612

sapply集計を実行するために使用します。これにより、簡単にプロットできるマトリックスが得られます。

myAggVars <- c("v1", "v2", "v3")
temp <- sapply(myAggVars, function(x) {
  aggregate(r1 ~ cut(get(x), quantile(get(x), probs = seq(0, 1, .1)), 
                     include.lowest = TRUE), data, mean)[[2]]
})
temp
#              v1        v2        v3
#  [1,]  9453.824 10355.899 10355.899
#  [2,] 11161.612  9453.824 20834.485
#  [3,] 15014.651 11161.612 17755.902
#  [4,] 13528.961 13896.830 13896.830
#  [5,] 13416.349 13416.349 11161.612
#  [6,] 16455.977 13528.961  9453.824
#  [7,] 13896.830 17755.902 13528.961
#  [8,] 17755.902 20834.485 16455.977
#  [9,] 20834.485 16455.977 13416.349
# [10,] 10355.899 15014.651 15014.651

プロットの手順は次のとおりです。

matplot(temp, type = "o", pch = 1)

そして結果：

ここに画像の説明を入力

r - 行列で、列 1 の 20 ～ 30 パーセンタイル値に関連付けられた列 4 の値の平均を求めます

2 に答える 2

Related

Reference