r - R、dplyr、snow: dplyr を使用する関数を並列化する方法

Question

myfunctionの各行に並行して適用したいとしますmyDataFrame。otherDataFrameが 2 つの列を持つデータフレームであるとします:でCOLUNM1_odf何らかのCOLUMN2_odf理由で使用されmyfunctionます。したがって、次のようなコードを使用して記述したいと思いparApplyます。

clus <- makeCluster(4)
clusterExport(clus, list("myfunction","%>%"))

myfunction <- function(fst, snd) {
 #otherFunction and aGlobalDataFrame are defined in the global env
 otherFunction(aGlobalDataFrame)

 # some code to create otherDataFrame **INTERNALLY** to this function
 otherDataFrame %>% filter(COLUMN1_odf==fst & COLUMN2_odf==snd)
 return(otherDataFrame)
}
do.call(bind_rows,parApply(clus,myDataFrame,1,function(r) { myfunction(r[1],r[2]) }

ここでの問題は、R が認識しないCOLUMN1_odfことCOLUMN2_odfですclusterExport。どうすればこの問題を解決できますか? snowそれぞれを列挙しないために必要なすべてのオブジェクトを「エクスポート」する方法はありますか?

otherDataFrame編集 1:が内部的に作成されることを指定するために、(上記のコードに) コメントを追加しましたmyfunction。

編集2:一般化するためにいくつかの擬似コードを追加しましたmyfunction:グローバルデータフレーム(aGlobalDataFrameおよび別の関数otherFunction)を使用するようになりました

score 0 · Accepted Answer

携帯電話でこれを見ていないので、いくつかの問題が見られます。

まず、関数で実際に作成していませんotherDataFrame。otherDataFrame既存のをにパイプしようとしていますが、が環境に存在しない場合、関数は失敗しますfilter。otherDataFrame

第 2 に、クラスター環境にパッケージを既にロードしていない限りdplyr、間違ったfilter関数を呼び出すことになります。

最後に、を呼び出したとき、何をparApplyどこに指定するかを指定していません。次のことを試してください。fstsnd

clus <- makeCluster(4)
clusterEvalQ(clus, {library(dplyr); library(magrittr)})
clusterExport(clus, "myfunction")

myfunction <- function(otherDataFrame, fst, snd) {
 dplyr::filter(otherDataFrame, COLUMN1_odf==fst & COLUMN2_odf==snd)
}
do.call(bind_rows,parApply(clus,myDataFrame,1,function(r, fst, snd) { myfunction(r[fst],r[snd]), "[fst]", "[snd]") }

r - R、dplyr、snow: dplyr を使用する関数を並列化する方法

2 に答える 2

Related

Reference