r - Subset of data with criteria of two columns

Question

I would like to create a subset of data that consists of Units that have a higher score in QTR 4 than QTR 1 (upward trend). Doesn't matter if QTR 2 or 3 are present.

Unit QTR    Score 
5     4     34  
1     1     22  
5     3     67  
2     4     78  
3     2     39   
5     2     34  
1     2     34  
5     1     67  
1     3     70  
1     4     89
3     4     19

Subset would be:

Unit  QTR   Score
1     1     22   
1     2     34  
1     3     70 
1     4     89

I've tried variants of something like this: upward_subset <- subset(mydata,Unit if QTR=4~Score > QTR=1~Score)

Thank you for your time

score 3 · Accepted Answer

データフレームの名前が「d」の場合、これはテストセットで成功します。

d[ which(d$Unit %in% 
    (sapply( split(d, d["Unit"]), 
         function(dd) dd[dd$QTR ==4, "Score"] - dd[dd$QTR ==1, "Score"]) > 0)) ,
  ]
 #-------------
   Unit QTR Score
2     1   1    22
7     1   2    34
9     1   3    70
10    1   4    89

score 2 · Accepted Answer

2 つのステップの代替手段:

result <- unlist(
                 by(
                 test,
                 test$Unit,
                 function(x) x$Score[x$QTR==4] > x$Score[x$QTR==2])
                )

test[test$Unit %in% names(result[result==TRUE]),]

   Unit QTR Score
2     1   1    22
7     1   2    34
9     1   3    70
10    1   4    89

score 2 · Accepted Answer

を使用したソリューションdata.table（おそらく、現在持っているものよりも優れたバージョンがあります）。

注: 特定のQTR値Unitが一意であると仮定する

Data:

df <- structure(list(Unit = c(5L, 1L, 5L, 2L, 3L, 5L, 1L, 5L, 1L, 1L, 
      3L), QTR = c(4L, 1L, 3L, 4L, 2L, 2L, 2L, 1L, 3L, 4L, 4L), Score = c(34L, 
      22L, 67L, 78L, 39L, 34L, 34L, 67L, 70L, 89L, 19L)), .Names = c("Unit", 
      "QTR", "Score"), class = "data.frame", row.names = c(NA, -11L
      ))

Solution:

dt <- data.table(df, key=c("Unit", "QTR"))
dt[, Score[Score[QTR == 4] > Score[QTR == 1]], by=Unit]

   Unit V1
1:    1 22
2:    1 34
3:    1 70
4:    1 89

r - Subset of data with criteria of two columns

3 に答える 3

Related

Reference