performance - メソッド v Hadoop MapReduce のクラスレベル変数

Question

これは、map reduce ステップ内での書き込み可能な変数と割り当てのパフォーマンスに関する質問です。レデューサーは次のとおりです。

static public class MyReducer extends Reducer<Text, Text, Text, Text> {
      @Override
      protected void reduce(Text key, Iterable<Text> values, Context context) {
        for (Text val : values) {
            context.write(key, new Text(val));
        }
      }
}

または、これはパフォーマンス面で優れています:

static public class MyReducer extends Reducer<Text, Text, Text, Text> {
      private Text myText = new Text();
      @Override
      protected void reduce(Text key, Iterable<Text> values, Context context) {
        for (Text val : values) {
            myText.set(val);
            context.write(key, myText);
        }
      }
}

Hadoop Definitive Guide では、すべての例が最初の形式になっていますが、それが短いコードサンプルのためなのか、より慣用的なためなのかはわかりません。

score 1 · Accepted Answer

この本では、より簡潔であるため、最初の形式を使用する場合があります。ただし、効率は低くなります。大きな入力ファイルの場合、この方法では多数のオブジェクトが作成されます。この過剰なオブジェクトの作成により、パフォーマンスが低下します。パフォーマンスに関しては、2 番目のアプローチが推奨されます。

この問題について説明している参考文献:

ここでヒント7 、
Hadoopオブジェクトの再利用、および
このジラ。

performance - メソッド v Hadoop MapReduce のクラス レベル変数

2 に答える 2

Related

Reference

performance - メソッド v Hadoop MapReduce のクラスレベル変数