r - データフレームから各サブジェクトの最後の行を抽出します

Question

私はこのようなRのデータフレームを持っています。各科目の最後の訪問を抽出したいと思います。

SUBJID VISIT

   40161       3  
   40161       4  
   40161       5  
   40161       6  
   40161       9  
   40201       3  
   40202       6  
   40202       8  
   40241       3  
   40241       4

望ましい出力は次のとおりです

SUBJID VISIT

   40161     9  
   40201     3  
   40202     8

Rでこれをどのように行う必要がありますか？どうもありがとうございました。

score 6 · Accepted Answer

agstudy は正しいですが、stats パッケージと集計関数には別の方法があります。

df <- read.table(text="SUBJID VISIT
40161 3
40161 4
40161 5
40161 6
40161 9
40201 3
40202 6
40202 8
40241 3
40241 4", header=TRUE)


aggregate(VISIT ~ SUBJID, df, max)

  SUBJID VISIT
1  40161     9
2  40201     3
3  40202     8
4  40241     4

score 4 · Accepted Answer

別の代替手段を示すために、私はその構文の単純さが好きなので、あなたも使うことができますdata.table. data.frameあなたが「df」と呼ばれると仮定します：

library(data.table)
# data.table 1.8.7  For help type: help("data.table")
DT <- data.table(df, key = "SUBJID")
DT[, list(VISIT = max(VISIT)), by = key(DT)]
#    SUBJID V1
# 1:  40161  9
# 2:  40201  3
# 3:  40202  8
# 4:  40241  4

また、R でこれを行う多くの方法を共有していますが、SQL 構文に慣れている場合はsqldf、次のように使用することもできます。

library(sqldf)
sqldf("select SUBJID, max(VISIT) `VISIT` from df group by SUBJID")
  SUBJID VISIT
1  40161     9
2  40201     3
3  40202     8
4  40241     4

score 3 · Accepted Answer

できるので、別の基本オプション:

 do.call(rbind,
         lapply(split(dat, dat$SUBJID), 
                function(x) tail(x$VISIT, 1) ) )
#      [,1]
#40161    9
#40201    3
#40202    8
#40241    4

編集

@BenBolkerが示唆するように：

 do.call(rbind,
             lapply(split(dat, dat$SUBJID), 
                    function(x) tail(x, 1) ) )

もっとある場合は、すべての列で機能するはずです。

score 1 · Accepted Answer

これが簡単な解決策diffです：

dat[c(diff(dat$SUBJID) != 0, TRUE), ]

   SUBJID VISIT
5   40161     9
6   40201     3
8   40202     8
10  40241     4

それはまた可能byです：

do.call(rbind, by(dat, dat$SUBJID, tail, 1))

      SUBJID VISIT
40161  40161     9
40201  40201     3
40202  40202     8
40241  40241     4

score 1 · Accepted Answer

plyrたとえば、パッケージを使用します。

 ddply(dat,.(SUBJID),summarise,VISIT=tail(VISIT,1))
  SUBJID VISIT
1  40161     9
2  40201     3
3  40202     8
4  40241     4

データの場所:

dat <- read.table(text ='SUBJID VISIT
40161 3
40161 4
40161 5
40161 6
40161 9
40201 3
40202 6
40202 8
40241 3
40241 4',head=T)

score 1 · Accepted Answer

sqldfパッケージ、ライブラリ（sqldf）も使用できます

sqldf("SELECT SUBJID, MAX(VISIT) From df GROUP BY by SUBJID")

  SUBJID VISIT
1  40161     9
2  40201     3
3  40202     8
4  40241     4

r - データフレームから各サブジェクトの最後の行を抽出します

7 に答える 7

Related

Reference