r - 特定の基準による行の削除

Question

次のようにデータを設定しています。

date     ID   weight    
Apr 4    1    21
Apr 5    1    22
Apr 6    1    23
Apr 4    2    30
Apr 5    2    31
Apr 6    2    32
Apr 7    2    12

最後に記録された重量がその ID の最大値ではないケースを調べてみたいと思います。したがって、上記の例では、最後の行がその ID の最大の日付ですID=2が、その ID の最大の重みではありません。

基本的に、最大日付の重みと ID 内の最大重みを持つデータフレームを吐き出す for ループを設定でき、差分スコアを実行できます。差スコアが 0 より大きい人は、最後の日付が削除された行が必要です。

subs <- levels(as.factor(df$ID)) 
newdf <- as.data.frame(rep(subs, each = 1))
names(newdf) <- c('ID')
newdf$max <- NA
newdf$last <- NA

for (i in subs){
  subdata = subset(df, ID == i)
  lastweight <- subdata$Weight[length(subdata$ID)]
  maxweight <- max(subdata$Weight)
  newdf$max[IDdate$ID == i]<-maxweight
  newdf$last[IDdate$ID == i]<-lastweight
}

IDdate$diff <- as.numeric(IDdate$max) - as.numeric(IDdate$last)

diff>0今、私が苦労しているのは、ID をどこから取得して元のデータフレームに移動し、それらの ID の最後の日付を削除できるようにするソリューションを考え出すことです。

私は試しwhichましsubsetたが、これは私が望むものではありません。

score 1 · Accepted Answer

私はこれらの問題に 2 つのステップでアプローチするのが好きです。まず、単一のグループで必要な機能を実行する関数を作成します (データが日付でソートされていると仮定します)。

df2 <- df[df$ID == 2,]

myfun <- function(x) {
  # if the maximum weight value isn't found on the last row,
  if (which.max(x$weight) != nrow(x)) { 
    # return the data.frame without the last row:
    return (x[-nrow(x), ])
  } else {
    # otherwise, return the whole thing:
    return (x) 
  }
}

myfun(df2)

次に、その関数を任意の数の「split-apply-combine」パッケージで使用できます。

プライヤー

library(plyr)
ddply(df, .(ID), myfun)

データ表

library(data.table)
DT <- data.table(df)
DT[, myfun(.SD), by=ID]

r - 特定の基準による行の削除

2 に答える 2

プライヤー

データ表

Related

Reference