sorting - Hadoop MapReduce: テキストファイル内の単語の並べ替えられたリストを返します

Question

したがって、私の仕事は、重複を維持しながら、テキストファイルに含まれるすべての単語のアルファベット順に並べ替えられたリストを返すことです。

{To be or not to be} −→ {be not or to to}

私の考えは、各単語を値だけでなくキーとしても捉えることです。この方法では、hadoop がキーをソートするため、キーは自動的にアルファベット順にソートされます。Reduce フェーズでは、同じキーを持つすべての単語 (基本的には同じ単語) を 1 つの Text 値に単純に追加します。

   public class WordSort {

   public static class Map extends Mapper<LongWritable, Text, Text, Text> {

   private Text word = new Text();

   public void map(LongWritable key, Text value, Context context) throws IOException, InterruptedException {
      String line = value.toString();
      StringTokenizer tokenizer = new StringTokenizer(line);
      while (tokenizer.hasMoreTokens()) {
        word.set(tokenizer.nextToken());
        // transform to lower case
        String lower = word.toString().toLowerCase();
        context.write(new Text(lower), new Text(lower));
      }
    }
  }

  public static class Reduce extends Reducer<Text, Text, Text, Text> {

  public void reduce(Text key, Iterable<Text> values, Context context) throws IOException, InterruptedException {
      String result = "";
      for (Text value : values){
         res += value.toString() + " ";
      }
      context.write(key, new Text(result));
    }
  }

ただし、私の問題は、出力ファイルで値を返すにはどうすればよいですか? 現時点で私はこれを持っています：

be be be 
not not 
or or
to to to

したがって、すべての行で、最初にキー、次に値がありますが、値を返すだけで、次のようになります。

be be
not 
or 
to to

これは可能ですか、それとも各単語の値から 1 つのエントリを削除するだけですか?

score 0 · Accepted Answer

HadoopのMaxTemperatureの例を試してみました-決定的なガイドと以下のコードが機能しました

context.write(null, new Text(result));

score 0 · Accepted Answer

免責事項: 私は Hadoop ユーザーではありませんが、CouchDB で多くの Map/Reduce を行っています。

キーだけが必要な場合は、空の値を発行してみませんか?

さらに、発生ごとにキーを取得する必要があるため、それらをまったく削減したくないようです。

sorting - Hadoop MapReduce: テキスト ファイル内の単語の並べ替えられたリストを返します

2 に答える 2

Related

Reference

sorting - Hadoop MapReduce: テキストファイル内の単語の並べ替えられたリストを返します