r - いくつかの変数をマージして、Rに新しい因子変数を作成するにはどうすればよいですか？

Question

調査のデータがあります。これは、次のような質問から来ています。

Did you do any of the following activities during your PhD

                             Yes, paid by my school. Yes, paid by me.  No. 

Attended an internationl conference?
Bought textbooks?

データは次のようにスプレッドシートに自動的に保存されます。

id conf.1 conf.2 conf.3 text.1 text.2 text.3

1    1                              1
2           1               1
3                   1       1
4                   1                    1
5

これは、参加者1が大学が支払った会議に参加したことを意味します。参加者2は彼が支払った会議に出席し、参加者3は参加しませんでした。

conf.1、conf.2、conf.3、text.1、text.2、text.3を1つの変数にマージしたい

id new.conf new.text

1   1        2
2   2        1
3   3        1
4   3        3

where the number now respresents the categories of the survey question

Thanks for your help

score 2 · Accepted Answer

質問の各セットが複数の回答を持つことができるかどうかについては述べません。その場合、このアプローチはうまくいかない可能性があります。その場合は、先に進む前に、質問の再現性を高めることをお勧めします。その警告が邪魔にならないように、これに旋風を与えます：

library(reshape2)
#recreate your data
dat <- data.frame(id = 1:5,
                  conf.1 = c(1,rep(NA,4)),
                  conf.2 = c(NA,1, rep(NA,3)),
                  conf.3 = c(NA,NA,1,1, NA),
                  text.1 = c(NA,1,1,NA,NA),
                  text.2 = c(1, rep(NA,4)),
                  text.3 = c(rep(NA,3),1, NA))

#melt into long format
dat.m <- melt(dat, id.vars = "id")
#Split on the "."
dat.m[, c("variable", "val")] <- with(dat.m, colsplit(variable, "\\.", c("variable", "val")))
#Subset out only the complete cases
dat.m <- dat.m[complete.cases(dat.m),]
#Cast back into wide format
dcast(id ~ variable, value.var = "val", data = dat.m)
#-----
  id conf text
1  1    1    2
2  2    2    1
3  3    3    1
4  4    3    3

score 0 · Accepted Answer

欠落している値に対処する基本的な方法は次のとおりです。

confvars <- c("conf.1","conf.2","conf.3")
textvars <- c("text.1","text.2","text.3")

which.sub <- function(x) {
maxsub <- apply(dat[x],1,which.max)
maxsub[(lapply(maxsub,length)==0)] <- NA
return(unlist(maxsub))
}

data.frame(
id = dat$id,
conf = which.sub(confvars),
text = which.sub(textvars)
)

結果：

  id conf text
1  1    1    2
2  2    2    1
3  3    3    1
4  4    3    3
5  5   NA   NA

score 0 · Accepted Answer

次の解決策は非常に簡単で、私はそれをよく使用します。Chaseが上で行ったのと同じデータフレームを使用してみましょう。

dat <- data.frame(id = 1:5,
                  conf.1 = c(1,rep(NA,4)),
                  conf.2 = c(NA,1, rep(NA,3)),
                  conf.3 = c(NA,NA,1,1, NA),
                  text.1 = c(NA,1,1,NA,NA),
                  text.2 = c(1, rep(NA,4)),
                  text.3 = c(rep(NA,3),1, NA))

次に、NAをゼロに置き換えることから始めます。

dat[is.na(dat)] <- 0

各列に異なる数値を掛けると、新しい変数を簡単に計算できます。

dat <- transform(dat, conf=conf.1 + 2*conf.2 + 3*conf.3,
                      text=text.1 + 2*text.2 + 3*text.3)

新しい変数（またはここではデータセット全体）のゼロをNAに再コード化して完了しましょう。

dat[dat == 0] <- NA 

> dat
  id conf.1 conf.2 conf.3 text.1 text.2 text.3 conf text
1  1      1     NA     NA     NA      1     NA    1    2
2  2     NA      1     NA      1     NA     NA    2    1
3  3     NA     NA      1      1     NA     NA    3    1
4  4     NA     NA      1     NA     NA      1    3    3
5  5     NA     NA     NA     NA     NA     NA   NA   NA

r - いくつかの変数をマージして、Rに新しい因子変数を作成するにはどうすればよいですか？

3 に答える 3

Related

Reference