hadoop - Hadoop でファイルをキー値ペアとして読み取って処理する方法

Question

以下のデータを Hadoop のキーと値のペアとして読み取ろうとしています。

name: "Clooney, George", release: "2013", movie: "Gravity",
name: "Pitt, Brad", release: "2004", movie: "Ocean's 12",
name: Clooney, George", release: "2004", movie: "Ocean's 12",
name: "Pitt, Brad", release: "1999", movie: "Fight Club"

次のような出力が必要です。

name: "Clooney, George", movie: "Gravity, Ocean's 12",
name: "Pitt, Brad", movie: "Ocean's 12, Fight Club",

次のように Mapper と Reducer を作成しました。

  public static class MyMapper
       extends Mapper<Text, Text, Text, Text>{

    private Text word = new Text();

    public void map(Text key, Text value, Context context
                    ) throws IOException, InterruptedException {
      StringTokenizer itr = new StringTokenizer(value.toString(),",");
  while (itr.hasMoreTokens()) {
    word.set(itr.nextToken());
    context.write(key, word);
  }
 }
}
  public static class MyReducer
       extends Reducer<Text,Text,Text,Text> {
    private Text result = new Text();

    public void reduce(Text key, Iterable<Text> values,
                       Context context
                       ) throws IOException, InterruptedException {
      String actors = "";
      for (Text val : values) {
         actors += val.toString();
      }
      result.set(actors);
      context.write(key, result);
    }
  }

また、次の構成の詳細を追加しました。

Configuration conf = new Configuration();
conf.set("mapreduce.input.keyvaluelinerecordreader.key.value.separator", ",");

次の出力が得られます。

name: "Clooney   George" release: "2013" movie: "Gravity" George" release: "2004" movie: "Ocean's 12"
name: "Pitt  Brad" release: "2004" movie: "Ocean's 12" Brad" release: "1999" movie: "Fight Club"

基本的なキーと値のペアを正しく読み取ることさえできないようです。Hadoop でのキーと値の処理はどうですか? 誰かがこれについて詳しく説明し、どこが間違っているかを指摘できますか?

ありがとう。TM

hadoop - Hadoop でファイルをキー値ペアとして読み取って処理する方法

1 に答える 1

Related

Reference