r - データを融解するときに、考えられるすべての変数が示されるようにするにはどうすればよいですか?

Question

編集2：開始データで更新

編集 1: 長い形式に溶かす前に、melt 関数のデータまたはパラメーターを変更する方法 (見落としているか理解していない可能性があります) を知りたいです。

次のデータから始めました。

             type1 type2 type3 type4 
A            43   0     1       0 
B             6   0     1       0 
C            16   0     3       1 
D            17   0     2       2

溶けるとこんな感じ。

    Sample variable count proportion
1  A       type1    43 0.97727273
2  A       type2     0 0.00000000
3  A       type3     1 0.02272727
4  A       type4     0 0.00000000
5  B       type1     6 0.85714286
6  B       type2     0 0.00000000
7  B       type3     1 0.14285714
8  B       type4     0 0.00000000
9  C       type1    16 0.80000000
10 C       type2     0 0.00000000
11 C       type3     3 0.15000000
12 C       type4     1 0.05000000

ただし、合計で、すべての可能な変数として type1 から type5 が存在する必要があります。データにはタイプ 5 が含まれていなかったため、融解データの一部ではありませんでした。テーブル内のサンプルごとにすべての変数を使用したいと思います。したがって、データにないタイプ 5 については、エントリがないのではなく、Sample type5 0 0 が必要です。メルトとキャストの API を見ましたが、上記の問い合わせに対する回答が見つかりませんでした。

何か案が？ありがとう！

score 2 · Accepted Answer

アップデート

データテーブルは、この種の問題に最適です。それらがどのように機能するかを実際に理解するには、ある程度の練習が必要かもしれませんが、見返りとして、非常にコンパクトで読みやすいコードが得られます。

# Raw data
dat <- read.table(con <- textConnection("type1 type2 type3 type4 
A            43   0     1       0 
B             6   0     1       0 
C            16   0     3       1 
D            17   0     2       2"), header=TRUE)
dat$Sample <- rownames(dat)

# Aggregate
library("reshape2")
library("data.table") ## 1.9.2+
dt.dat <- melt(dat, value.name="count") ## melt.data.table method
dt.dat[, list(variable, count, proportion=prop.table(count)), by=Sample]

元の答え

を使用して、最終結果に存在する必要があるインデックス変数のすべての可能な組み合わせでフレームを作成しexpand.grid、次にを使用して値をフレームにコピーできますmerge。

# Read in the data in your question
> dat <- read.table(con <- textConnection("Sample variable count proportion
A      type1    15 0.93750000    
A      type2     0 0.00000000    
A      type3     1 0.06250000    
A      type4   0 0.00000000    
B      type1    13 0.86666667   
B      type2     0 0.00000000   
B      type3     2 0.13333333   
B      type4     0 0.00000000"), header=TRUE)
> close(con)

# Create all the records that should be present in the final results
> entries <- expand.grid(Sample=c("A", "B"), variable=sprintf("type%i", 1:5))

# Voilà!
> (dat <- merge(entries, dat, by=c("Sample", "variable"), all.x=TRUE))

   Sample variable count proportion
1       A    type1    15  0.9375000
2       A    type2     0  0.0000000
3       A    type3     1  0.0625000
4       A    type4     0  0.0000000
5       A    type5    NA         NA
6       B    type1    13  0.8666667
7       B    type2     0  0.0000000
8       B    type3     2  0.1333333
9       B    type4     0  0.0000000
10      B    type5    NA         NA

0代わりに必要に応じNAて、このように変更できます

dat[3:4] <- lapply(dat[3:4], function(x) ifelse(is.na(x), 0, x))

score 0 · Accepted Answer

新しいデータセットには、という名前の列があると思いますSample

データ

dat <-structure(list(Sample = structure(1:4, .Label = c("A", "B", "C", 
"D"), class = "factor"), type1 = c(43L, 6L, 16L, 17L), type2 = c(0L, 
0L, 0L, 0L), type3 = c(1L, 1L, 3L, 2L), type4 = c(0L, 0L, 1L, 
2L)), .Names = c("Sample", "type1", "type2", "type3", "type4"
), class = "data.frame", row.names = c(NA, -4L))

dat[setdiff(paste0("type", 1:5), colnames(dat)[-1])] <- 0
library(reshape2)

datM <- melt(dat, id.var="Sample")
datM1 <- within(datM, {proportion <-ave(value, Sample, FUN=function(x) x/sum(x))})[order(datM$Sample),]
row.names(datM1) <- 1:nrow(datM1)

 datM1
 #  Sample variable value proportion
#1       A    type1    43 0.97727273
#2       A    type2     0 0.00000000
#3       A    type3     1 0.02272727
#4       A    type4     0 0.00000000
#5       A    type5     0 0.00000000
#6       B    type1     6 0.85714286
#7       B    type2     0 0.00000000
#8       B    type3     1 0.14285714
#9       B    type4     0 0.00000000
#10      B    type5     0 0.00000000
#11      C    type1    16 0.80000000
#12      C    type2     0 0.00000000
#13      C    type3     3 0.15000000
#14      C    type4     1 0.05000000
#15      C    type5     0 0.00000000
#16      D    type1    17 0.80952381
#17      D    type2     0 0.00000000
#18      D    type3     2 0.09523810
#19      D    type4     2 0.09523810
#20      D    type5     0 0.00000000

r - データを融解するときに、考えられるすべての変数が示されるようにするにはどうすればよいですか?

2 に答える 2

アップデート

元の答え

データ

Related

Reference