“snow”の関連問題_Stack Overflow日本語サイト

0 投票する

2 に答える

1181 参照

r - ランダムフォレストブートストラップトレーニングとフォレスト生成

ランダムフォレスト (dim: 47600811*9) の膨大なトレーニングデータがあります。複数の (1000 としましょう) 次元 10000*9 のブートストラップサンプル (実行ごとに 9000 のネガティブクラスと 1000 のポジティブクラスのデータポイントを取得) を取得し、それらすべてのツリーを繰り返し生成してから、それらすべてのツリーを 1 つのフォレストに結合します。必要なコードの大まかなアイデアを以下に示します。実際のtrainDataから置換を使用してランダムサンプルを生成し、それらのツリーを最適に繰り返し生成する方法を教えてもらえますか? とても助かります。ありがとう

実際のtrainDataからサブセットを何度も（1000回）生成する最適な方法であるかどうかはわかりません。

2016-09-14T15:12:03.983

0 投票する

0 に答える

58 参照

r - Snow Cluster で RcppZiggurat RNG-Generator を初期化する

雪のクラスターでパッケージ RcppZiggurat の RNG を使用しようとしています。各ノードで RNG をシードするには、clusterSetupRNG に組み込まれている L'Ecuyer のアルゴリズムを使用します。ただし、乱数のシーケンスは各ノードで同じであることが判明しました。clusterSetupRNG コマンドなしで実行しても、コードの結果はまったく同じままです。

これが私が試したことです：

どうすればこれを機能させることができるか考えている人はいますか? もちろん、いつでも標準の RNG rnorm に頼ることができますが、クラスタでも RcppZiggurat の速度が得られると便利です。

助けてくれてどうもありがとう！

r random parallel-processing rcpp snow

2016-09-23T07:44:53.163

0 投票する

2 に答える

1655 参照

r - R、dplyr、snow: dplyr を使用する関数を並列化する方法

myfunctionの各行に並行して適用したいとしますmyDataFrame。otherDataFrameが 2 つの列を持つデータフレームであるとします:でCOLUNM1_odf何らかのCOLUMN2_odf理由で使用されmyfunctionます。したがって、次のようなコードを使用して記述したいと思いparApplyます。

ここでの問題は、R が認識しないCOLUMN1_odfことCOLUMN2_odfですclusterExport。どうすればこの問題を解決できますか? snowそれぞれを列挙しないために必要なすべてのオブジェクトを「エクスポート」する方法はありますか?

otherDataFrame編集 1:が内部的に作成されることを指定するために、(上記のコードに) コメントを追加しましたmyfunction。

編集2:一般化するためにいくつかの擬似コードを追加しましたmyfunction:グローバルデータフレーム(aGlobalDataFrameおよび別の関数otherFunction)を使用するようになりました

r parallel-processing dplyr magrittr snow

2016-10-23T03:40:55.650

0 投票する

1 に答える

936 参照

r - R snow の stopCluster フリーズ

とを使用してクラスターコンピューターでモンテカルロシミュレーションを実行していましsnowたR。Rのラインに達するまで、すべてがうまくいきstopCluster、Rそこでフリーズし、最終的に壁の時間を超えました。の問題がわかりませんstopCluster。

以下は、簡略化されたバージョンの私のRスクリプトです。

上記のスクリプトはtest_stack.R、ディレクトリの下に保存されましたmonte-carlo/R。サーバーに送信したpbsスクリプトは次のとおりです。

ファイルの一部をRout以下にリストしました。で止まりstopCluster()ます。

r hpc snow

2016-12-07T00:47:19.247

0 投票する

2 に答える

669 参照

r - Calling external program in parallel using foreach and doSNOW: How to import results?

I'm using R to call an external program in parallel on a cluster with multiple nodes and multiple cores. The external program requires three input data files and produces one output file (all files are stored in the same subfolder). To run the program in parallel (or rather call it in a parallel fashion) I've initially used the foreach function together with the doParallel library. This works fine as long as I'm just using multiple cores on a single node.

However, I wanted to use multiple nodes with multiple cores. Therefore I modified my code accordingly to use the doSNOW library in conjunction with foreach (I tried Rmpi and doMPI, but I did not manage to run the code on multiple nodes with those libraries). This works fine, i. e. the external program is now indeed run on multiple nodes (with multiple cores) and the cluster logfile shows, that it produces the required results. The problem I'm facing now, however, is that the external program no longer stores the results/output files on the master node/in the specified subfolder of the working directory (it did so, when I was using doParallel). This makes it impossible for me to import the results into R.

Indeed, if I check the content of the relevant folder it does not contain any output files, despite the logfile clearly showing that the external program ran successfully. I guess they are stored on the different nodes (?). What modifications do I have to make to either my foreach function or the way I set up my cluster, to get those files saved on the master node/in the specified subfolder in my working directory?

Here some example R code, to showcase, what I'm doing:

My pbs script for the cluster looks something like this:

r foreach parallel-processing pbs snow

2017-04-22T13:09:16.810

0 投票する

0 に答える

52 参照

r - r ループで Snow を使用して並列化する方法

時間がかかりすぎる大きなループがあります (~100 日)。Snow ライブラリで高速化したいと思っていますが、apply ステートメントは苦手です。これはループの一部にすぎませんが、この部分を理解できれば、残りは簡単です。たくさんの適用ステートメントまたはループで問題ありませんが、関数を使用してオブジェクト 'p' を取得する 1 つの適用ステートメントが理想的です。

元データ

オリジナルループ

r for-loop apply snow

2017-06-15T17:04:00.657

0 投票する

1 に答える

437 参照

r - as.formula、SE dplyr、およびlapplyを使用する場合、foreachパッケージはR環境をどのようにスコープしますか?

複数の数式を文字列として動的に作成し、それらをで数式にキャストする関数がありますas.formula。doSNOW次に、とを使用して並列プロセスでその関数を呼び出し、でforeachこれらの式を使用しdplyr::mutate_ます。

使用すると、ローカルで実行すると正常に動作しますが、並列で実行するとlapply(formula_list, as.formula)エラーが発生します。could not find function *custom_function*ただし、使用するlapply(formula_list, function(x) as.formula(x)と、並行してローカルで動作します。

なんで？ここで環境を理解する正しい方法と、それをコーディングする「正しい」方法は何ですか?

次のような警告が表示されます。In e$fun(obj, substitute(ex), parent.frame(), e$data) : already exporting variable(s): *custom_func*

最小限の再現可能な例を以下に示します。

編集: 元の投稿のタイトルで、nse を使用していると書きましたが、本当は標準評価を使用するつもりでした。おっと。それに応じてこれを変更しました。

r dplyr parallel-foreach snow nse

2017-06-21T17:23:36.380

問題タブ [snow]

Reference