hadoop - hadoop mapreduce での個別の出力ファイル

Question

私の質問はおそらくすでに尋ねられていますが、私の質問に対する明確な答えが見つかりません。

私の MapReduce は基本的な WordCount です。私の現在の出力ファイルは次のとおりです。

// filename : 'part-r-00000'
789  a
755  #c   
456  d
123  #b

出力ファイル名を変更するにはどうすればよいですか?

次に、2つの出力ファイルを持つことは可能ですか:

// First output file
789  a
456  d

// Second output file
123  #b
755  #c

ここに私の削減クラスがあります:

public static class SortReducer extends Reducer<IntWritable, Text, IntWritable, Text> {

    public void reduce(IntWritable key, Text value, Context context) throws IOException, InterruptedException {

        context.write(key, value);

    }
}

ここに私のPartitionnerクラスがあります:

public class TweetPartitionner extends Partitioner<Text, IntWritable>{

    @Override
    public int getPartition(Text a_key, IntWritable a_value, int a_nbPartitions) {
        if(a_key.toString().startsWith("#"))
            return 1;
        return 0;
    }


}

どうもありがとう！

score 1 · Accepted Answer

出力ファイル名を変更する方法に関する他の質問については、http://hadoop.apache.org/docs/stable/api/org/apache/hadoop/mapreduce/lib/output/MultipleOutputs.htmlをご覧ください。 #write(java.lang.String , K, V)。

score 0 · Accepted Answer

ジョブファイルセット内

job.setNumReduceTasks(2);

マッパーエミットから

パーティショナーを作成し、パーティショナーをジョブ構成に追加します。パーティショナーで、キーが # で始まるかどうかを確認し、1 を返し、それ以外の場合は 0 を返します

レデューサーでは、キーと値を交換します

hadoop - hadoop mapreduce での個別の出力ファイル

2 に答える 2

Related

Reference