r - R - 2 つの data.table の一致する値のインデックス

Question

これは StackOverflow での最初の投稿です。私は比較的プログラミングの初心者であり、R で data.table を操作しようとしています。その速度の評判からです。

「Actions」という名前の非常に大きな data.table があり、5 つの列と数百万行の可能性があります。列名は k1、k2、i、l1、および l2 です。「States」という名前の列k1とk2にActionsの一意の値を持つ別のdata.tableがあります。

Actions のすべての行について、States と一致する列 4 と 5 の一意のインデックスを見つけたいと思います。再現可能なコードは次のとおりです。

S.disc <- c(2000,2000)
S.max  <- c(6200,2300)
S.min  <- c(700,100)

Traces.num <- 3
Class.str <- lapply(1:2,function(x) seq(S.min[x],S.max[x],S.disc[x]))
Class.inf <- seq_len(Traces.num)
Actions <- data.table(expand.grid(Class.inf, Class.str[[2]], Class.str[[1]], Class.str[[2]], Class.str[[1]])[,c(5,4,1,3,2)])
setnames(Actions,c("k1","k2","i","l1","l2"))
States <- unique(Actions[,list(k1,k2,i)])

したがって、data.frame を使用していた場合、次の行は次のようになります。

index  <- apply(Actions,1,function(x) {which((States[,1]==x[4]) & (States[,2]==x[5]))})

data.table で同じことを効率的に行うにはどうすればよいですか?

score 3 · Accepted Answer

これは、 a の式でkeys使用できる特殊な記号とこつをつかめば、比較的簡単です。これを試して...jdata.table

#  First make an ID for each row for use in the `dcast`
#  because you are going to have multiple rows with the
#  same key values and you need to know where they came from
Actions[ , ID := 1:.N ]

#  Set the keys to join on
setkeyv( Actions , c("l1" , "l2" ) )
setkeyv( States , c("k1" , "k2" ) )    

#  Join States to Actions, using '.I', which
#  is the row locations in States in which the
#  key of Actions are found and within each
#  group the row number ( 1:.N - a repeating 1,2,3)
New <- States[ J(Actions) , list( ID , Ind = .I , Row = 1:.N ) ]
#    k1  k2 ID Ind Row
#1: 700 100  1   1   1
#2: 700 100  1   2   2
#3: 700 100  1   3   3
#4: 700 100  2   1   1
#5: 700 100  2   2   2
#6: 700 100  2   3   3

#  reshape using 'dcast.data.table'
dcast.data.table( Row ~ ID , data = New , value.var = "Ind" )
#   Row 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27...
#1:   1 1 1 1 4 4 4 7 7 7 10 10 10 13 13 13 16 16 16  1  1  1  4  4  4  7  7  7...
#2:   2 2 2 2 5 5 5 8 8 8 11 11 11 14 14 14 17 17 17  2  2  2  5  5  5  8  8  8...
#3:   3 3 3 3 6 6 6 9 9 9 12 12 12 15 15 15 18 18 18  3  3  3  6  6  6  9  9  9...

r - R - 2 つの data.table の一致する値のインデックス

1 に答える 1

Related

Reference