language-agnostic - Goの並列個別単語数

Question

標準入力からテキストを読み取り、テキスト内で見つかった個別の単語の総数を返す (出力する) プログラムを作成します。

並列プログラミングでこの課題にどのように対処できるでしょうか (Go が望ましいですが、英語での説明で十分です)。

score 3 · Accepted Answer

いくつかの可能性がありますが、「効率的に」という意味ですか？

一般的な考え方は、テキストを扱いやすいチャンクに分割し、それらのチャンクをキューに積み上げ、複数のコンシューマーにチャンクを処理させることです。

これは、典型的な Map/Reduce アプリケーションのように見えます。

          _ Worker_
         /          \
        /            \
Splitter--- Worker ---Aggregator
        \            /
         \_ Worker _/

理想的には、「複数の」キューは複数のコンシューマーを持つ単一のキューであり、1 つのワーカーが遅くなってもプロセス全体が遅くなることはありません。

また、スプリッターからワーカーへのシグナルを使用して、入力が完全に消費されたことをワーカーに知らせ、結果をアグリゲーターに送信できるようにします。

score 1 · Accepted Answer

Go での Jakob Østergaard's problemの解決策を次に示します。

/*
    The problem: Write a program that reads text from 
    standard-input, and returns (prints) the total 
    number of distinct words found in the text. 

    C versus C++, Jakob Østergaard, February 24th, 2004
    http://unthought.net/c++/c_vs_c++.html
*/

package main

import (
    "bufio"
    "fmt"
    "os"
    "unicode"
)

func main() {
    rdr := bufio.NewReader(os.Stdin)
    words := make(map[string]bool, 1024*1024)
    word := make([]int, 0, 256)
    for {
        r, n, _ := rdr.ReadRune()
        if unicode.IsSpace(r) || n == 0 {
            if len(word) > 0 {
                words[string(word)] = true
                word = word[:0]
            }
            if n == 0 {
                break
            }
        } else {
            word = append(word, r)
        }
    }
    fmt.Println(len(words))
}

この問題やその他のほとんどの問題に「並列プログラミング」という言葉を追加して、魔法のような改善を期待するのは単純です。ストリームから入力を順番に読み取り、単純な計算を実行しても、並列計算を行う機会はほとんどありません。

language-agnostic - Goの並列個別単語数

2 に答える 2

Related

Reference