java - 複数の出力パス (Java - Hadoop - MapReduce)

Question

私は 2 つの MapReduce ジョブを実行します。2 番目のジョブで結果を 2 つの異なるディレクトリ内の 2 つの異なるファイルに書き込めるようにしたいと考えています。ある意味で FileInputFormat.addInputPath(.., multiple input path) に似たものが欲しいのですが、出力用です。

私は MapReduce をまったく初めて使用context.write(..)し、Reduce ステップで使用する Hadoop 0.21.0 で自分のコードを記述することに特化していますが、複数の出力パスを制御する方法がわかりません...

御時間ありがとうございます！

私の最初の仕事からの私のreduceCodeは、私が出力する方法しか知らないことを示すために（/../part *ファイルに入ります。キー）：

public static class NormalizeReducer extends Reducer<LongWritable, NetflixRating, LongWritable, NetflixUser> {
    public void reduce(LongWritable key, Iterable<NetflixRating> values, Context context) throws IOException, InterruptedException {
        NetflixUser user = new NetflixUser(key.get());
        for(NetflixRating r : values) {
            user.addRating(new NetflixRating(r));
        }
        user.normalizeRatings();
        user.reduceRatings();
        context.write(key, user);
    }
}

編集：あなたが言及したように、最後のコメントでメソッドを実行しました、アマール。それが機能するかどうかはわかりませんが、HDFS には別の問題がありますが、忘れる前に、文明のために私の発見をここに入れましょう :

http://archive.cloudera.com/cdh/3/hadoop-0.20.2+228/api/org/apache/hadoop/mapreduce/lib/output/MultipleOutputs.html

MultipleOutputs は、FormatOutputFormat の代わりには機能しません。FormatOutputFormat で 1 つの出力パスを定義すると、複数の MultipleOutputs でさらに多くのパスを追加できます。
addNamedOutput メソッド: 文字列 namedOutput は、説明する言葉です。
パスは実際には write メソッドの String baseOutputPath arg で定義します。

score 2 · Accepted Answer

あなたが言及したように、私は最後のコメントでメソッドを実行しました、Amar。それが機能するかどうかはわかりませんが、HDFS には別の問題がありますが、忘れる前に、文明のために私の発見をここに入れましょう :

http://archive.cloudera.com/cdh/3/hadoop-0.20.2+228/api/org/apache/hadoop/mapreduce/lib/output/MultipleOutputs.html

MultipleOutputs は、FormatOutputFormat の代わりには機能しません。FormatOutputFormat で 1 つの出力パスを定義すると、複数の MultipleOutputs でさらに多くのパスを追加できます。addNamedOutput メソッド: 文字列 namedOutput は、説明する言葉です。パスは実際には write メソッドの String baseOutputPath arg で定義します。

java - 複数の出力パス (Java - Hadoop - MapReduce)

1 に答える 1

Related

Reference