r - r data.tableで結合因子列を2つの因子列に分割する最も効率的な方法は何ですか?

Question

fcombined と値 fcombined は要因ですが、実際には 2 つの要因が相互作用した結果です。ここでの問題は、1 つの因子の列を再び 2 つに分割する最も効率的な方法は何かということです。私はすでに問題なく機能する解決策を考え出していますが、見逃したもっと簡単な方法があるかもしれません。実際の例は次のとおりです。

library(stringr)
f1=1:20
f2=1:20
g=expand.grid(f1,f2)
combinedfactor=as.factor(paste(g$Var1,g$Var2,sep="_"))
largedata=1:10^6
DT=data.table(fcombined=combinedfactor,value=largedata)


splitfactorcol=function(res,colname,splitby="_",namesofnewcols){#the nr. of cols retained is length(namesofnewcols)
  helptable=data.table(.factid=seq_along(levels(res[[colname]])) ,str_split_fixed(levels(res[[colname]]),splitby,length(namesofnewcols)))
  setnames(helptable,colnames(helptable),c(".factid",namesofnewcols))
  setkey(helptable,.factid)
  res$.factid=unclass(res[[colname]])
  setkey(res,.factid)
  m=merge(res,helptable)
  m$.factid=NULL
  m
}
splitfactorcol(DT,"fcombined",splitby="_",c("f1","f2"))

score 3 · Accepted Answer

これでうまくいき、約 5 倍速くなると思います。

setkey(DT, fcombined)
DT[DT[, data.table(fcombined = levels(fcombined),
                   do.call(rbind, strsplit(levels(fcombined), "_")))]]

レベルを分割し、その結果を単純に元のにマージしましたdata.table。

ところで、私のテストでは、(このタスクでは)関数strsplitよりも約 2 倍高速でした。stringr

r - r data.tableで結合因子列を2つの因子列に分割する最も効率的な方法は何ですか?

1 に答える 1

Related

Reference