r - rでのサブ選択と作成

Question

このデータセットを想定します。

household_id person_id age_group  
1            1         5  
1            2         3  
1            3         2  
2            1         3  
2            2         5
2            3         1
2            4         1

次のように、世帯にage_group=1の人が含まれているかどうかを示す新しいフィールドを作成します。

household_id person_id age_group age_group1  
1            1         5         0  
1            2         3         0
1            3         2         0
2            1         3         1
2            2         5         1
2            3         1         1
2            4         1         1

私はあなたの助けに感謝します！

score 3 · Accepted Answer

plyr解決策：

require(plyr)
df <- structure(list(household_id = c(1L, 1L, 1L, 2L, 2L, 2L, 2L), 
person_id = c(1L, 2L, 3L, 1L, 2L, 3L, 4L), age_group = c(5L, 
3L, 2L, 3L, 5L, 1L, 1L)), .Names = c("household_id", "person_id", 
"age_group"), class = "data.frame", row.names = c(NA, -7L))

ddply(df, .(household_id), transform, age_group1 = 0 + any(age_group == 1))

#   household_id person_id age_group age_group1
# 1            1         1         5          0
# 2            1         2         3          0
# 3            1         3         2          0
# 4            2         1         3          1
# 5            2         2         5          1
# 6            2         3         1          1
# 7            2         4         1          1

編集： data.table代替：

require(data.table)
dt <- data.table(df, key="household_id")
dt[, age_group1 := 0 + any(age_group == 1), by=household_id]

score 3 · Accepted Answer

ave(t$age_group, t$household_id, FUN=function(x) 1 %in% x)
[1] 0 0 0 1 1 1 1

> t$age_group1 <- with(t, ave(age_group, household_id, FUN=function(x) 1 %in% x))
> t
  household_id person_id age_group age_group1
1            1         1         5          0
2            1         2         3          0
3            1         3         2          0
4            2         1         3          1
5            2         2         5          1
6            2         3         1          1
7            2         4         1          1

score 2 · Accepted Answer

データを読んだ後

dat <- read.table(text = 'household_id person_id age_group  
1            1         5  
1            2         3  
1            3         2  
2            1         3  
2            2         5
2            3         1
2            4         1',head=T)

transformwith ave（@Mathhewソリューションと同様）を使用しますが、より簡潔なsytnaxを使用します

 transform(dat, age_group1  = ave(age_group, household_id, FUN=function(x) any(x==1)))

  household_id person_id age_group age_group1
1            1         1         5          0
2            1         2         3          0
3            1         3         2          0
4            2         1         3          1
5            2         2         5          1
6            2         3         1          1
7            2         4         1          1

score 1 · Accepted Answer

多くの人がすでにそれを知っているので、私はこの種のものが好きsqlです、それは言語を超えて動作します（sas has proc sql;）、そしてそれはひどく直感的です:)

# read your data into an object named `x`

# load the sqldf library
library(sqldf)

# create a new household-level table that contains just
# the household id and a 0/1 indicator of
# whether anyone within the household meets your requirement
households <- 
    sqldf( 'select household_id , max( age_group == 1 ) as age_group1 from x group by household_id' )

# merge the new column back on to the original table
x <- merge( x , households )

# view your result
x

score 1 · Accepted Answer

パッケージのインストールを伴わない別のオプションがあります;）

# read your data frame into `x`
x <- read.table( text = "household_id person_id age_group  
1            1         5  
1            2         3  
1            3         2  
2            1         3  
2            2         5
2            3         1
2            4         1" , head=TRUE)


# determine the maximum of age_group == 1 within each household id
hhold <- aggregate( age_group == 1 ~ household_id , FUN = max , data = x )

# now just change the name of the second column
names( hhold )[ 2 ] <- 'age_group1'

# merge it back on and you're done
x <- merge( x , hhold )

# look at the result
x

r - rでのサブ選択と作成

5 に答える 5

Related

Reference