r - long から wide への複雑なデータ変換 (時変変数を使用)

Question

現在、「長い」形式の Multistate Analysis データセットに取り組んでいます (各個人の観測に対して 1 行。各個人は最大 5 回まで繰り返し測定されます)。

アイデアは、各個人が時変状態変数 s = 1, 2, 3, 4のレベル間で繰り返し移行できるということです。私が持っている他のすべての変数 (here cohort) は、任意の内で固定されていますid。

いくつかの分析の後、訪問した状態の特定の順序に従って、データセットを「広い」形式で再形成する必要があります。最初の長いデータの例を次に示します。

  dat <- read.table(text = "

        id    cohort    s    
        1       1       2
        1       1       2
        1       1       1
        1       1       4
        2       3       1
        2       3       1
        2       3       3
        3       2       1
        3       2       2
        3       2       3
        3       2       3
        3       2       4", 

    header=TRUE)

s1最終的な「ワイド」データセットは、新しく作成された変数、s2、s3、s4に記録された訪問されたs5状態の特定の個々のシーケンスを考慮に入れる必要がありますs1。

上記の例によると、ワイドデータセットは次のようになります。

    id    cohort    s1    s2    s3    s4    s5    
    1       1       2      2     1     4     0
    2       3       1      1     3     0     0
    3       2       1      2     3     3     4

を使用しようとしましたがreshape()、転置にも集中しましsたが、意図した結果は得られませんでした。実際、R 関数に関する私の知識はかなり限られています。ありがとう。

編集: 異なる種類のワイドデータセットの取得

ご協力ありがとうございます。できれば関連する質問があります。特に、各個体が長時間観察され、状態間の遷移がほとんどない場合はdat、次の代替方法で初期サンプルを再形成すると非常に便利です。

    id    cohort    s1    s2    s3    s4    s5    dur1  dur2  dur3  dur4  dur5 
    1       1       2      1     4     0     0      2     1     1     0     0  
    2       3       1      3     0     0     0      2     1     0     0     0
    3       2       1      2     3     4     0      1     1     2     1     0

実際には、 s1-s5は個別の訪問済み状態であり、dur1-dur5それぞれの個別の訪問済み状態で費やされた時間です。

このデータ構造に到達するための手を貸していただけますか? durを使用する前に、すべての - 変数とs- 変数を中間サンプルで作成する必要があると思いますreshape()。そうでなければ、直接採用することは可能でしょ-reshape2-うか？

score 5 · Accepted Answer

dat <- read.table(text = "
        id    cohort    s    
        1       1       2
        1       1       2
        1       1       1
        1       1       4
        2       3       1
        2       3       1
        2       3       3
        3       2       1
        3       2       2
        3       2       3
        3       2       3
        3       2       4", 
    header=TRUE)     

df <- data.frame(
    dat,
    period = sequence(rle(dat$id)$lengths) 
)

wide <- reshape(df, v.names = "s", idvar = c("id", "cohort"),
                timevar = "period", direction = "wide")

wide[is.na(wide)] = 0
wide

与えます:

  id cohort s.1 s.2 s.3 s.4 s.5
1  1      1   2   2   1   4   0
5  2      3   1   1   3   0   0
8  3      2   1   2   3   3   4

次に、次の行を使用して名前を付けます。

names(wide) <- c('id','cohort', paste('s', seq_along(1:5), sep=''))

#   id cohort s1 s2 s3 s4 s5
# 1  1      1  2  2  1  4  0
# 5  2      3  1  1  3  0  0
# 8  3      2  1  2  3  3  4

sep=''ステートメントで使用する場合wide、変数の名前を変更する必要はありません。

wide <- reshape(df, v.names = "s", idvar = c("id", "cohort"),
                timevar = "period", direction = "wide", sep='')

変数の作成を回避し、ステートメントで直接period置換することを回避する方法があると思いますが、まだそれらを理解していません。NAwide

score 3 · Accepted Answer

わかった...

library(plyr)
library(reshape2)

dat2 <- ddply(dat,.(id,cohort), function(x) 
       data.frame(s=x$s,name=paste0("s",seq_along(x$s))))


dat2 <- ddply(dat2,.(id,cohort), function(x) 
       dcast(x, id + cohort ~ name, value.var= "s" ,fill= 0)
       )

dat2[is.na(dat2)] <- 0

dat2

#    id cohort s1 s2 s3 s4 s5
#    1  1      1  2  2  1  4  0
#    2  2      3  1  1  3  0  0
#    3  3      2  1  2  3  3  4

これは正しいようですか？最初のddplyものはほとんどエレガントではありません。

score 3 · Accepted Answer

これを試して：

library(reshape2)

dat$seq <- ave(dat$id, dat$id, FUN = function(x) paste0("s", seq_along(x)))
dat.s <- dcast(dat, id + cohort ~ seq, value.var = "s", fill = 0)

これにより、次のようになります。

> dat.s
  id cohort s1 s2 s3 s4 s5
1  1      1  2  2  1  4  0
2  2      3  1  1  3  0  0
3  3      2  1  2  3  3  4

列名として 1、2、...、5 だけを使用してもかまわない場合は、次のようにave行を短くすることができます。

dat$seq <- ave(dat$id, dat$id, FUN = seq_along)

後で追加された2番目の質問については、これを試してください：

library(plyr)
dur.fn <- function(x) {
  r <- rle(x$s)$length
  data.frame(id = x$id[1], dur.value = r, dur.seq = paste0("dur", seq_along(r)))
}
dat.dur.long <- ddply(dat, .(id), dur.fn)
dat.dur <- dcast(dat.dur.long, id ~ dur.seq, c, value.var = "dur.value", fill = 0)
cbind(dat.s, dat.dur[-1])

与える：

  id cohort s1 s2 s3 s4 s5 dur1 dur2 dur3 dur4
1  1      1  2  2  1  4  0    2    1    1    0
2  2      3  1  1  3  0  0    2    1    0    0
3  3      2  1  2  3  3  4    1    1    2    1

r - long から wide への複雑なデータ変換 (時変変数を使用)

編集: 異なる種類のワイド データセットの取得

3 に答える 3

Related

Reference

編集: 異なる種類のワイドデータセットの取得