r - 回答の値に基づいて順番に名前が付けられた変数を再コーディングする

Question

lapply値を節約して再コード化するのに苦労しています。

それぞれ 4 つの回答がある 10 のアンケート質問があり、常に 1 つの正解または不正解があるとします。質問はでラベル付けq_1されq_10、私のデータフレームはと呼ばれdfます。質問を単に「正しい」(1) または「間違っている」(0) としてコード化する、同じ連続したラベルを持つ新しい変数を作成したいと考えています。

正解のリストを作成するとしたら、次のようになります。

right_answers<-c(1,2,3,4,2,3,4,1,2,4)

次に、次のような同じシーケンシャル識別子を使用しながら、すべての変数を新しい変数に単純に再コード化する関数を作成しようとしています。

lapply(1:10, function(fx) {
  df$know_[fx]<-ifelse(df$q_[fx]==right_answers[fx],1,0)
})

このコードが少しでも正しかった仮想世界では、次のような結果が得られます。

id   q_1    know_1   q_2   know_2
1    1      1        2     1
2    4      0        3     0
3    3      0        2     1
4    4      0        1     0

助けてくれてどうもありがとう！

score 1 · Accepted Answer

他の回答と同じマトリックス出力については、次のことをお勧めします。

q_names <- paste0("q_", seq_along(right_answers))
answers <- df[q_names]
correct <- mapply(`==`, answers, right_answers)

score 0 · Accepted Answer

コードのこの部分に問題がある可能性がありますdf$q_[fx]。を使用して列名を呼び出すことができますpaste。そのような：

df = read.table(text = "
id   q_1   q_2
1    1              2     
2    4              3     
3    3              2     
4    4              1", header = TRUE)  

right_answers = c(1,2,3,4,2,3,4,1,2,4)

dat2 = sapply(1:2, function(fx) {
            ifelse(df[paste("q",fx,sep = "_")]==right_answers[fx],
                      1,0)
})

これは data.frame に列を追加しませんが、代わりに @SenorO の答えのような新しいマトリックスを作成します。次のように、マトリックス内の列に名前を付けて、元の data.frame に追加できます。

colnames(dat2) = paste("know", 1:2, sep = "_")

data.frame(df, dat2)

score 0 · Accepted Answer

reshape2 パッケージを使用して、質問に対する別のアプローチを提案したいと思います。私の意見では、これには次のような利点があります。1) より慣用的な R (その価値があるため)、2) より読みやすいコード、3) 特に将来分析を追加する場合にエラーが発生しにくい。このアプローチでは、すべてがデータフレーム内で行われます。これは、可能であれば望ましいと思います。単一のレコード (この場合は id) のすべての値を保持しやすく、R ツールの機能を使いやすくなります。

# Creating a dataframe with the form you describe
df <- data.frame(id=c('1','2','3','4'), q_1 = c(1,4,3,4), q_2 = c(2,3,2,1), q_3 = rep(1,     4), q_4 = rep(2, 4), q_5 = rep(3, 4), 
             q_6 = rep(4,4), q_7 = c(1,4,3,4), q_8 = c(2,3,2,1), q_9 = rep(1, 4), q_10 =     rep(2, 4))

right_answers<-c(1,2,3,4,2,3,4,1,2,4)

# Associating the right answers explicitly with the corresponding question labels in a data frame
answer_df <- data.frame(questions=paste('q', 1:10, sep='_'), right_answers)

library(reshape2)

# "Melting" the dataframe from "wide" to "long" form -- now questions labels are in variable values rather than in column names
melt_df <- melt(df) # melt function is from reshape2 package

# Now merging the correct answers into the data frame containing the observed answers
merge_df <- merge(melt_df, answer_df, by.x='variable', by.y='questions')

# At this point comparing the observed to correct answers is trivial (using as.numeric to     convert from logical to 0/1 as you request, though keeping as TRUE/FALSE may be clearer)
merge_df$correct <- as.numeric(merge_df$value==merge_df$right_answers)

# If desireable (not sure it is), put back into "wide" dataframe form
cast_obs_df <- dcast(merge_df, id ~ variable, value.var='value') # dcast function is from reshape2 package
cast_cor_df <- dcast(merge_df, id ~ variable, value.var='correct')
names(cast_cor_df) <- gsub('q_', 'know_', names(cast_cor_df))
final_df <- merge(cast_obs_df, cast_cor_df)

新しい tidyr パッケージは、おそらく reshape2 よりも優れているでしょう。

r - 回答の値に基づいて順番に名前が付けられた変数を再コーディングする

4 に答える 4

Related

Reference