clojure - pmap|reducers/map がすべての CPU コアを使用していないのはなぜですか?

Question

100 万行のファイルを解析しようとしています。各行は、書籍に関する情報 (著者、内容など) を含む json 文字列です。を使用しようとするとプログラムがをスローするため、 iotaを使用してファイルをロードしています。また、チェシャを使用して文字列を解析しています。プログラムは単にファイルをロードし、すべての本のすべての単語を数えます。OutOfMemoryErrorslurp

私の最初の試みpmapには、重い作業が含まれていましたが、これは基本的にすべての CPU コアを利用することになると考えました。

(ns multicore-parsing.core
  (:require [cheshire.core :as json]
            [iota :as io]
            [clojure.string :as string]
            [clojure.core.reducers :as r]))


(defn words-pmap
  [filename]
  (letfn [(parse-with-keywords [str]
            (json/parse-string str true))
          (words [book]
            (string/split (:contents book) #"\s+"))]
    (->>
     (io/vec filename)
     (pmap parse-with-keywords)
     (pmap words)
     (r/reduce #(apply conj %1 %2) #{})
     (count))))

すべてのコアを使用しているように見えますが、各コアがその容量の 50% 以上を使用することはめったにありません。私の推測では、pmap のバッチサイズに関係しているため、いくつかのコメントがライブラリを参照しているという比較的古い質問clojure.core.reducersに出くわしました。 .

を使用して関数を書き直すことにしましたreducers/map：

(defn words-reducers
  [filename]
  (letfn [(parse-with-keywords [str]
            (json/parse-string str true))
          (words [book]
            (string/split (:contents book) #"\s+"))]
  (->>
   (io/vec filename)
   (r/map parse-with-keywords)
   (r/map words)
   (r/reduce #(apply conj %1 %2) #{})
   (count))))

ただし、CPU の使用率は悪く、以前の実装と比較して完了するまでに時間がかかります。

multicore-parsing.core=> (time (words-pmap "./dummy_data.txt"))
"Elapsed time: 20899.088919 msecs"
546
multicore-parsing.core=> (time (words-reducers "./dummy_data.txt"))
"Elapsed time: 28790.976455 msecs"
546

私は何を間違っていますか？大きなファイルを解析するとき、mmap の読み込み + レデューサーは正しいアプローチですか?

編集：これは私が使用しているファイルです。

EDIT2：iota/seq代わりに次のタイミングがありますiota/vec：

multicore-parsing.core=> (time (words-reducers "./dummy_data.txt"))
"Elapsed time: 160981.224565 msecs"
546
multicore-parsing.core=> (time (words-pmap "./dummy_data.txt"))
"Elapsed time: 160296.482722 msecs"
546

score 3 · Accepted Answer

レデューサーは遅延シーケンスにまったくうまく対応できないため、レデューサーが適切なソリューションになるとは思いません（レデューサーは遅延シーケンスで正しい結果をもたらしますが、うまく並列化できません）。

このサンプルコードは、「 Seven Concurrency Models in Seven Weeks (免責事項: 私は著者です) 」という本から参照できます。これは、同様の問題を解決します (各単語がウィキペディアに表示される回数を数えます)。

ウィキペディアのページのリストを指定すると、この関数は単語を順番にカウントします (get-wordsページから一連の単語を返します)。

(defn count-words-sequential [pages]
  (frequencies (mapcat get-words pages)))

pmapこれは、より高速に実行される並列バージョンですが、約 1.5 倍高速です。

(defn count-words-parallel [pages]
  (reduce (partial merge-with +)
    (pmap #(frequencies (get-words %)) pages)))

約 1.5 倍しか速くならない理由は、がボトルネックになるためです。つまり、ページごとに 1 回reduce呼び出します。(partial merge-with +)100 ページのバッチをマージすると、パフォーマンスが 4 コアマシンで約 3.2 倍に向上します。

(defn count-words [pages]
  (reduce (partial merge-with +)
    (pmap count-words-sequential (partition-all 100 pages))))

clojure - pmap|reducers/map がすべての CPU コアを使用していないのはなぜですか?

1 に答える 1

Related

Reference