r - インジケータを含むグループを識別

Question

指標を含むグループを特定したいと考えています。districts以下の例では、が含まれていることを識別したいと思いますcounty == 'other'。county == 'other'aにある場合、その地区の各行の指標変数を、elseにdistrictしたいと思います。以下に、、、を使用してこれを行う試みをいくつか示しますが、どれも機能しません。おそらく、のすべての行を抽出し、インジケーターをそのサブセットの 1 つとして定義し、そのサブセットを元のデータセットにマージすることができますが、もっと簡単な方法があるに違いないと考え続けています。アドバイスありがとうございます。10splitlapplyanycounty == 'other'

df.1 <- read.table(text = '

    state    district    county    apples
       AA          EC        AB       100
       AA          EC        BC        10
       AA          EC        DC       150
       AA           C        FG       200
       AA           C     other        20
       AA           C        HC       250
       AA          WC        RT       300
       AA          WC        TT        30
       AA          WC     other       350

', header=TRUE, stringsAsFactors = FALSE)

desired.result <- read.table(text = '

    state    district    county    apples  indicator
       AA          EC        AB       100          0
       AA          EC        BC        10          0
       AA          EC        DC       150          0
       AA           C        FG       200          1
       AA           C     other        20          1
       AA           C        HC       250          1
       AA          WC        RT       300          1
       AA          WC        TT        30          1
       AA          WC     other       350          1

', header=TRUE, stringsAsFactors = FALSE)

# various attempts that do not work

with(df.1, lapply(split(county, district), function(x) {any(x)=='county' <- 1} ))
with(df.1, lapply(split(county, district), function(x) {ifelse(any(x)=='other', 1, 0)} ))
with(df.1, lapply(split(county, district), function(x) {any(x)=='other'} ))
with(df.1, lapply(split(df.1  , district), function(x) {any(x$county)=='other'} ))
with(df.1, lapply(split(county, district), function(x) {x=='other'} ))

編集

上記のサブセット/マージアプローチは次のとおりです。

df.indicator <- df.1[df.1$county == 'other',]
df.indicator <- df.indicator[,1:2]
df.indicator$indicator = 1
merge(df.1, df.indicator, by=c('state', 'district'), all=TRUE)

ベースRを使用することを好みます。

score 2 · Accepted Answer

library(data.table)

dt = data.table(df.1)
dt[, indicator := 1*any(county == 'other'), by = district]

dt
#   state district county apples indicator
#1:    AA       EC     AB    100         0
#2:    AA       EC     BC     10         0
#3:    AA       EC     DC    150         0
#4:    AA        C     FG    200         1
#5:    AA        C  other     20         1
#6:    AA        C     HC    250         1
#7:    AA       WC     RT    300         1
#8:    AA       WC     TT     30         1
#9:    AA       WC  other    350         1

これが基本的な解決策です-それははるかに遅くて醜いですが、それがOPの問題であれば、まあ:)

df.1$indicator = as.numeric(ave(df.1$county, df.1$district,
                                FUN = function(x) {1*any(x == "other")}))

または

df.1$indicator <- with(df.1, ave(county=='other', district, FUN=max))

または

df.1$indicator <- with(df.1, ave(county=='other', district, FUN=any)+0L)

score 0 · Accepted Answer

これは、関数の apply ファミリーを使用してこれまでに思いついた最高のものです。

df.1 <- read.table(text = '

    state    district    county    apples
       AA          EC        AB       100
       AA          EC        BC        10
       AA          EC        DC       150
       AA           C        FG       200
       AA           C     other        20
       AA           C        HC       250
       AA          WC        RT       300
       AA          WC        TT        30
       AA          WC     other       350

', header=TRUE, stringsAsFactors = FALSE)

z <- with(df.1, lapply(split( df.1, district), function(x) { merge(x, ifelse('other' %in% x$county, 1, 0), all=TRUE) } )) ; z
df.2 <- do.call(rbind, z)
rownames(df.2) = NULL
df.2

与える:

  state district county apples y
1    AA        C     FG    200 1
2    AA        C  other     20 1
3    AA        C     HC    250 1
4    AA       EC     AB    100 0
5    AA       EC     BC     10 0
6    AA       EC     DC    150 0
7    AA       WC     RT    300 1
8    AA       WC     TT     30 1
9    AA       WC  other    350 1

r - インジケータを含むグループを識別

3 に答える 3

Related

Reference