r - 2 つの条件を満たす一意の観測値を特定し、R を削除します

Question

次のようなdfがあります。

data
   names  fruit
7   john  apple
13  john orange
14  john  apple
2   mary orange
5   mary  apple
8   mary orange
10  mary  apple
12  mary  apple
1    tom  apple
6    tom  apple

やりたいことは2つ。最初に、リンゴとオレンジの両方を持つ一意の観測値の数を数えます (つまり、2 つのメアリーとジョン)。

その後、データフレームからそれらを削除して、リンゴのみを取得した一意の個人のみが残るようにします。

これは私が試したものです

toremove<-unique(data[data$fruit=='apple' & data$fruit=='orange',"names"])  ##this part doesn't work, if it had I would have used the below code to remove the names identified
data2<-data[!data$names %in% toremove,]

実際のデータはフルーツよりも少し複雑なので、grepl を使いたかったのです。これは私が試したことです（最初にdata.tableに変換されました）

data1<-data.table(data1)
z<-data1[,ind := grepl('app.*? & orang.*?', fruit), by='names']  ## this works fine when i just use 'app.*?' but collapses when I try to add the & sign, so I'm making an error with the operator. In addition the by='names' doesn't work out for me, which is important. My plan here was to create an indicator (if an individual has an apple and an orange, then they get an indicator==1 and I would then filter them out on the basis of this indicator).

つまり、要約すると、私の問題は、リンゴとオレンジの両方を持っている人を特定することです。これはとても簡単に思えるので、このことを教えてくれるリソースを教えてください。

希望の出力

names fruit
1   tom apple
6   tom apple

score 6 · Accepted Answer

s のみの名前のみを探している場合は、次appleの簡単なdata.table方法があります

setDT(data)[ , if(all(fruit == "apple")) .SD, by = names]
#    names fruit
# 1:   tom apple
# 2:   tom apple

"apple" と "orange" の両方のカウントを持つ一意の観測については、次のようにすることができます

data[, any(fruit == "apple") & any(fruit == "orange"), by = names][, sum(V1)]
## [1] 2

最後に、一意のが 1 つしかないユーザーだけを探している場合は、GH (または)の devel バージョンからfruit使用して条件付けを試みることができます。uniqueNlength(unique())

data[, if(uniqueN(fruit) < 2L) .SD, by = names]
#    names fruit
# 1:   tom apple
# 2:   tom apple

r - 2 つの条件を満たす一意の観測値を特定し、R を削除します

2 に答える 2

Related

Reference