hadoop - Hadoop MR 用のシーケンスファイル形式の作成

Question

と仕事をしHadoop MapRedueていて、質問がありました。現在、私のマッパーinput KV typeはLongWritable, LongWritable typeと output KV typeですLongWritable, LongWritable type。InputFileFormat は SequenceFileInputFormat です。基本的に私がやりたいことは、txt ファイルを SequenceFileFormat に変更して、これをマッパーで使用できるようにすることです。

私がやりたいことは

入力ファイルはこのようなものです

1\t2 (key = 1, value = 2)

2\t3 (key = 2, value = 3)

そして何度も...

私はこのスレッドを見ました.txtファイルをHadoopのシーケンスファイル形式に変換する方法ですTextInputFormatが、サポートのみを信頼していますKey = LongWritable and Value = Text

txt を取得してでシーケンスファイルを作成する方法はありますKV = LongWritable, LongWritableか?

score 7 · Accepted Answer

確かに、基本的には、あなたがリンクした他のスレッドで私が言ったのと同じ方法です. ただし、独自のを実装する必要がありますMapper。

あなたのためのちょっとしたスクラッチ：

public class LongLongMapper extends
    Mapper<LongWritable, Text, LongWritable, LongWritable> {

  @Override
  protected void map(LongWritable key, Text value,
      Mapper<LongWritable, Text, LongWritable, LongWritable>.Context context)
      throws IOException, InterruptedException {

    // assuming that your line contains key and value separated by \t
    String[] split = value.toString().split("\t");

    context.write(new LongWritable(Long.valueOf(split[0])), new LongWritable(
        Long.valueOf(split[1])));

  }

  public static void main(String[] args) throws IOException,
      InterruptedException, ClassNotFoundException {

    Configuration conf = new Configuration();
    Job job = new Job(conf);
    job.setJobName("Convert Text");
    job.setJarByClass(LongLongMapper.class);

    job.setMapperClass(Mapper.class);
    job.setReducerClass(Reducer.class);

    // increase if you need sorting or a special number of files
    job.setNumReduceTasks(0);

    job.setOutputKeyClass(LongWritable.class);
    job.setOutputValueClass(LongWritable.class);

    job.setOutputFormatClass(SequenceFileOutputFormat.class);
    job.setInputFormatClass(TextInputFormat.class);

    FileInputFormat.addInputPath(job, new Path("/input"));
    FileOutputFormat.setOutputPath(job, new Path("/output"));

    // submit and wait for completion
    job.waitForCompletion(true);
  }
}

マッパー関数の各値は入力の行を取得するため、区切り文字 (タブ) で分割し、その各部分を long に解析するだけです。

それでおしまい。

hadoop - Hadoop MR 用のシーケンス ファイル形式の作成

1 に答える 1

Related

Reference

hadoop - Hadoop MR 用のシーケンスファイル形式の作成