java - Hadoop (糸):マッパー入力セパレーターを設定しますか?

Question

MR ジョブのマップ機能で受け取るキーと値のペアに異なるセパレータを設定できるようにしたいと考えています。

たとえば、私のテキストファイルには次のものが含まれている可能性があります。

John-23
Mary-45
Scott-13

私のマップ関数では、各要素のキーをJohn、値を23などにします。

次に、出力セパレーターを設定すると

conf.set("mapreduce.textoutputformat.separator", "-");

レデューサーは最初の「-」までキーを取得し、その後すべての値を取得しますか? または、レデューサーにも変更を加える必要がありますか?

ありがとう

score 1 · Accepted Answer

読む

を使用する場合はorg.apache.hadoop.mapreduce.lib.input.TextInputFormat、で単に aString#splitを使用できますMapper。

 @Override
 public void map(LongWritable key, Text value, Context context) throws IOException, InterruptedException {

     String[] keyValue = value.toString().split("-");
     // would emit John -> 23 as a text
     context.write(new Text(keyValue[0]), new Text(keyValue[1]));
 }

書き込み

そのように出力する場合：

Text key = new Text("John");
LongWritable value = new LongWritable(23);
// of course key and value can come from the reduce method itself,
// I just want to illustrate the types
context.write(key, value);

はい、TextOutputFormatあなたの望むフォーマットでそれを書いてくれます：

John-23

Hadoop 2.x (YARN) で遭遇し、ここで既に回答した唯一のトラップは、プロパティの名前がに変更されたことmapreduce.output.textoutputformat.separatorです。

java - Hadoop (糸):マッパー入力セパレーターを設定しますか?

1 に答える 1

Related

Reference