r - Rを使用して、いくつかの列で値が異なる重複行を折りたたむ

Question

私のデータフレームには、ID は同じですが、テスト年と年齢の値が異なる行があります。重複する行を折りたたんで、異なる値の新しい列を作成したいと思います。

私はRが初めてで、しばらく苦労しています。

これはデータフレームです:

>df
    id プロジェクト testyr1 testyr2 age1 age2
1 16S AS 2008 NA 29 NA
2 32S AS 2004 NA 30 NA
3 37S AS NA 2011 NA 36
4 50S AS 2004 NA 23 NA
5 50S AS 1998 NA 16 NA
6 55S AS 2007 NA 28 NA

testyr1最初の年とtestyr2最新の年が必要です。若い年齢と古い年齢age1である必要があります。age2

出力は次のようになります。

      id プロジェクト testyr1 testyr2 age1 age2   
1 16S AS 2008 NA 29 NA  
2 32S AS 2004 NA 30 NA  
3 37S AS NA 2011 NA 36  
4 50S AS 1998 2004 16 23  
6 55S AS 2007 NA 28 NA

ループを書き込もうとしましたが、終了方法がわかりません:

df.undup <- c()
df.undup <- c()    
for (i in 1:nrow(df)){   
  if i == i+1    
    df$testyr1 != NA {   

    testyr2 = max(testyr1)   
    testyr1 = min(testyr1)   
    nage2 = max(nage1)   
    nage1 = min(nage1)   
  }   
 else{   
    testyr2 = max(testyr2)   
    testyr1 = min(testyr2)   
    nage2 = max(nage2)   
    nage1 = min(nage2)   
  }   
}

どんな助けでも大歓迎です。

score 3 · Accepted Answer

library(plyr)

data <- read.csv(textConnection("id,project,testyr1,testyr2,age1,age2
16S,AS,2008,NA,29,NA
32S,AS,2004,NA,30,NA
37S,AS,NA,2011,NA,36
50S,AS,2004,NA,23,NA
50S,AS,1998,NA,16,NA
55S,AS,2007,NA,28,NA"))


new_data <- ddply(data, .(id), function(x) {
  return(data.frame(id = unique(x$id), project = unique(x$project), 
    testyr1 = min(x$testyr1), 
    testyr2 = max(x$testyr2), age1= min(x$age1), age2 = max(x$age2)))
    })

> new_data

    id project testyr1 testyr2 age1 age2
1 16S      AS    2008      NA   29   NA
2 32S      AS    2004      NA   30   NA
3 37S      AS      NA    2011   NA   36
4 50S      AS    2004      NA   23   NA
5 50S      AS    1998      NA   16   NA
6 55S      AS    2007      NA   28   NA

# But your result example suggests you want the lowest 
# of testyr to be in testyr1 and the highest of the combined
# testyrs to be in testyr2. Same logic for ages.
# If so, the one below should work:

new_data <- ddply(data, .(id), function(x) {
    if(dim(x)[1]>1) {
    years <- c(x$testyr1, x$testyr2)
    ages <-  c(x$age1, x$age2)
    return(data.frame(id = unique(x$id), project = unique(x$project), 
        testyr1 = min(years, na.rm=T), testyr2 = max(years , na.rm=T), 
        age1= min(ages, na.rm=T), age2 = max(ages, na.rm=T)))   
    } else {
    return(data.frame(id = unique(x$id), project = unique(x$project), 
        testyr1 = x$testyr1, testyr2 = x$testyr2, 
        age1= x$age1, age2 = x$age2)) 
    }       
    })

> new_data
   id project testyr1 testyr2 age1 age2
1 16S      AS    2008      NA   29   NA
2 32S      AS    2004      NA   30   NA
3 37S      AS      NA    2011   NA   36
4 50S      AS    1998    2004   16   23
5 55S      AS    2007      NA   28   NA

score 0 · Accepted Answer

これがこれを行うための最も効果的な方法であるとは本当に疑っていますが、私の脳は現在機能していません。

temp = names(which(table(df$id) > 1))
temp1 = vector("list")
for (i in 1:length(temp)) {
  temp1[[i]] = df[df$id == temp[i], ]
  temp1[[i]] = data.frame(temp1[[i]][1, 1:2], 
                     testyr1 = min(temp1[[i]]$testyr1), 
                     testyr2 = max(temp1[[i]]$testyr1), 
                     age1 = min(temp1[[i]]$age1), 
                     age2 = max(temp1[[i]]$age1))
}

rbind(df[-c(which(df$id %in% temp)), ], do.call(rbind, temp1))
#    id project testyr1 testyr2 age1 age2
# 1 16S      AS    2008      NA   29   NA
# 2 32S      AS    2004      NA   30   NA
# 3 37S      AS      NA    2011   NA   36
# 6 55S      AS    2007      NA   28   NA
# 4 50S      AS    1998    2004   16   23

### rm(i, temp, temp1) ### Cleanup the workspace

r - Rを使用して、いくつかの列で値が異なる重複行を折りたたむ

2 に答える 2

Related

Reference