r - データフレームのサブセット化と合計

Question

私の目標は次のとおりです。二分応答（たとえば、0と1）のデータフレームが与えられた場合、次のような要約行列を作成するにはどうすればよいですか。1）2つの列（最初の質問に正しく答えるためのものと、間違って答えるためのもの）、 2）特定の合計スコアを取得する個人の数に関する行があります。

たとえば、50人の回答者と5つの質問があるとします。これは、6つの応答パターンがあることを意味します（すべて正しくない/ 0、次に1、2、3、および4つ正しい、最後にすべて正しい/ 1）。結果の行列オブジェクトを次のようにしたいと思います。

... INCORRECT ..... CORRECT   <-- pertaining to a 0 or 1 on the first item respectively

[1]... 10 ............ 0      <-- indicating people who, after responded 0 on the first question, responded 0 on all questions (5 zeroes)
[2]... 8  ............ 2      <-- indicating 12 people who got 1 correct (8 got the first question incorrect, 2 got the first question correct)
[3]... 4 ............. 8      <-- indicating 12 people who got 2 correct (4 got the first question incorrect but got 2 of the other questions correct, 8 got the first question and 1 other correct)
[4]... 6 ............. 3      <-- indicating 9 people who got 3 correct
[5]... 3 ............. 4      <-- indicating 7 people who got 4 correct
[6]... 0 ............. 8      <-- pertaining to the 8 people who answered all 5 questions correctly (necessarily indicating they got the first question correct).

私の考えでは、最初の質問のパフォーマンスでデータフレームを分割し（一度に1列ずつ作業）、各行（参加者）の合計スコアを見つけて、それらを最初の列に集計する必要があります。次に、2番目にも同じことをしますか？

これはパッケージに組み込まれる予定なので、基本関数のみを使用してこれを行う方法を理解しようとしています。

これは、私が使用するものと同様のデータセットの例です。

n <- 50
z <- c(0, 1)
samp.fun <- function(x, n){
    sample(x, n, replace = TRUE)
}

data <- data.frame(0)
for (i in 1:5){
    data[1:n, i] <- samp.fun(z, n)
}
names(data)[1:5] <- c("x1", "x2", "x3", "x4", "x5")

どんな考えでも大歓迎です！

score 4 · Accepted Answer

@alexwhanのデータを使用して、次のdata.table解決策があります。

require(data.table)
dt <- data.table(data)

dt[, list(x1.incorrect=sum(x1==0), x1.correct=sum(x1==1)), keyby=total]
#    total x1.incorrect x1.correct
# 1:     0            2          0
# 2:     1            7          1
# 3:     2            9          8
# 4:     3            7          6
# 5:     4            0          7
# 6:     5            0          3

同様に、後で列名を設定してもかまわない場合は、次のように使用tableして、結果をさらに直接的に取得できます。as.list

dt[, as.list(table(factor(x1, levels=c(0,1)))), keyby=total]
#    total 0 1
# 1:     0 2 0
# 2:     1 7 1
# 3:     2 9 8
# 4:     3 7 6
# 5:     4 0 7
# 6:     5 0 3

注：次のようにラップできas.list(.)ますsetNames()。

dt[, setNames(as.list(table(factor(x1, levels=c(0,1)))), 
           c("x1.incorrect", "x1.correct")), keyby = total]

列名も一度に設定します。

score 3 · Accepted Answer

データの作成時に使用しなかったためset.seed、このソリューションをあなたの例と照合することはできませんが、それがあなたが求めているものだと思います。reshape2とから関数を使用してplyr、データの要約を取得しています。

library(reshape2)
library(plyr)
#create data
set.seed(1234)
n <- 50
z <- c(0, 1)
samp.fun <- function(x, n){
  sample(x, n, replace = TRUE)
}

data <- data.frame(0)
for (i in 1:5){
  data[1:n, i] <- samp.fun(z, n)
}
names(data)[1:5] <- c("x1", "x2", "x3", "x4", "x5")
data$id <- 1:50

#First get the long form to make summaries on
data.m <- melt(data, id.vars="id")

#Get summary to find total correct answers
data.sum <- ddply(data.m, .(id), summarise,
                  total = sum(value))

#merge back with original data to associate with id
data <- merge(data, data.sum)
data$total <- factor(data$total)

#summarise again to get difference between patterns
data.sum2 <- ddply(data, .(total), summarise,
               x1.incorrect = length(total) - sum(x1),
               x1.correct = sum(x1))
data.sum2
#   total x1.incorrect x1.correct
# 1     0            2          0
# 2     1            7          1
# 3     2            9          8
# 4     3            7          6
# 5     4            0          7
# 6     5            0          3

score -1 · Accepted Answer

素敵なパズル-私がそれを正しく理解すれば、これもそれを行うはずです：

table(rowSums(data),data[,1])

r - データフレームのサブセット化と合計

3 に答える 3

Related

Reference