r - データフレーム B に含まれるデータフレーム A の行のインデックス

Question

A と B の 2 つのデータフレームがあります。

a1 <- c(12, 12, 12, 23, 23, 23, 34, 34, 34)
a2 <- c(1, 2, 3 , 2, 4 , 5 , 2 , 3 , 4)
A <- as.data.frame(cbind(a1, a2))

b1 <- c(12, 23, 34)
b2 <- c(1, 2, 2)
B <- as.data.frame(cbind(b1, b2))

> A
  a1 a2
1 12  1
2 12  2
3 12  3
4 23  2
5 23  4
6 23  5
7 34  2
8 34  3
9 34  4
> B
  b1 b2
1 12  1
2 23  2
3 34  2

基本的に、B には A の行が含まれ、一意の a1 ごとに a2 の最小値が含まれます。

私がしなければならないことは簡単です。A[index.vector, ] が B と等しくなるように、行インデックス (または行番号?) を index.vector に対して呼び出します。

この特定の問題では、a1 の一意の値ごとに同じ値が a2 にないため、解は 1 つしかありません。

ルーチンは速ければ速いほどよい。500 行から数百万行のデータフレームにこれを適用する必要があります。

score 0 · Accepted Answer

私は自分のデータが最初に順序付けされていることを確認します（あなたの例では、データは正しく順序付けられていますが、これは常にそうであるとは限りません）match。（またはNA一致がない場合）。

A <- A[ order( A$a1 , A$a2 ) , ]
A
#  a1 a2
#1 12  1
#2 12  2
#3 12  3
#4 23  2
#5 23  4
#6 23  5
#7 34  2
#8 34  3
#9 34  4

#  Get row indices for required values
match( B$b1 , A$a1 )
[1] 1 4 7

そして、ここに data.table ソリューションがあり、大きなテーブルでははるかに高速になるはずです

require(data.table)
A <- data.table( A )
B <- data.table( B )

#  Use setkeyv to order the tables by the values in the first column, then the second column
setkeyv( A , c("a1","a2") )
setkeyv( B , c("b1","b2") )

#  Create a new column that is the row index of A
A[ , ID:=(1:nrow(A)) ]

#  Join A and B on the key columns (this works because you have unique values in your second column for each grouping of the first), giving the relevant ID
A[B]
#   a1 a2 ID
#1: 12  1  1
#2: 23  2  4
#3: 34  2  7

r - データ フレーム B に含まれるデータ フレーム A の行のインデックス

1 に答える 1

Related

Reference

r - データフレーム B に含まれるデータフレーム A の行のインデックス