java - WordCount MapReduce が予期しない結果を出している

Question

mapreduce の wordcount に対してこの Java コードを試しています。reduce メソッドの完了後、最大回数の単語のみを表示したいと考えています。

そのために、myoutput、mykey、completeSum という名前のクラスレベル変数をいくつか作成しました。

このデータを close メソッドで書き込んでいますが、最後に予期しない結果が得られます。

public class WordCount {

public static class Map extends MapReduceBase implements
        Mapper<LongWritable, Text, Text, IntWritable> {
    private final static IntWritable one = new IntWritable(1);
    private Text word = new Text();

    public void map(LongWritable key, Text value,
            OutputCollector<Text, IntWritable> output, Reporter reporter)
            throws IOException {
        String line = value.toString();
        StringTokenizer tokenizer = new StringTokenizer(line);

        while (tokenizer.hasMoreTokens()) {
            word.set(tokenizer.nextToken());
            output.collect(word, one);
        }

    }
}

static int completeSum = -1;
static OutputCollector<Text, IntWritable> myoutput;
static Text mykey = new Text();

public static class Reduce extends MapReduceBase implements
        Reducer<Text, IntWritable, Text, IntWritable> {

    public void reduce(Text key, Iterator<IntWritable> values,
            OutputCollector<Text, IntWritable> output, Reporter reporter)
            throws IOException {
        int sum = 0;
        while (values.hasNext()) {
            sum += values.next().get();
        }

        if (completeSum < sum) {
            completeSum = sum;
            myoutput = output;
            mykey = key;
        }


    }

    @Override
    public void close() throws IOException {
        // TODO Auto-generated method stub
        super.close();
        myoutput.collect(mykey, new IntWritable(completeSum));
    }
}

public static void main(String[] args) throws Exception {

    JobConf conf = new JobConf(WordCount.class);
    conf.setJobName("wordcount");

    conf.setOutputKeyClass(Text.class);
    conf.setOutputValueClass(IntWritable.class);

    conf.setMapperClass(Map.class);
    // conf.setCombinerClass(Reduce.class);
    conf.setReducerClass(Reduce.class);

    conf.setInputFormat(TextInputFormat.class);
    conf.setOutputFormat(TextOutputFormat.class);

    FileInputFormat.setInputPaths(conf, new Path(args[0]));
    FileOutputFormat.setOutputPath(conf, new Path(args[1]));

    JobClient.runJob(conf);

}
}

入力ファイルデータ

one 
three three three
four four four four 
 six six six six six six six six six six six six six six six six six six 
five five five five five 
seven seven seven seven seven seven seven seven seven seven seven seven seven

結果は次のようになります

six 18

しかし、私はこの結果を得ています

three 18

結果から、合計は正しいがキーが正しくないことがわかります。

誰かがこれらの map メソッドと reduce メソッドについて適切な参照を提供できれば、それは非常に役に立ちます。

score 1 · Accepted Answer

あなたが観察している問題は、参照のエイリアシングが原因です。によって参照されるオブジェクトkeyは、複数の呼び出しの新しいコンテンツで再利用されるためmykey、同じオブジェクトを参照するように変更されます。最後に縮小されたキーで終了します。これは、次のようにオブジェクトをコピーすることで回避できます。

mykey = new Text(key);

ただし、static変数は分散クラスター内の異なるノードで共有できないため、出力ファイルからのみ結果を取得する必要があります。これは、スタンドアロンモードでのみ機能し、map-reduce の目的を無効にします。

最後に、スタンドアロンモードであっても、グローバル変数を使用すると、並列ローカルタスクを使用する場合に競合が発生する可能性が高くなります ( MAPREDUCE-1367およびMAPREDUCE-434を参照)。

java - WordCount MapReduce が予期しない結果を出している

入力ファイルデータ

結果は次のようになります

しかし、私はこの結果を得ています

誰かがこれらの map メソッドと reduce メソッドについて適切な参照を提供できれば、それは非常に役に立ちます。

1 に答える 1

Related

Reference