hadoop - Hadoop プログラムのドライバーを作成する複数の方法 - どれを選択しますか?

Question

Hadoop プログラムのドライバーメソッドを記述する方法は複数あることに気付きました。

 public void run(String inputPath, String outputPath) throws Exception {
    JobConf conf = new JobConf(WordCount.class);
    conf.setJobName("wordcount");

    // the keys are words (strings)
    conf.setOutputKeyClass(Text.class);
    // the values are counts (ints)
    conf.setOutputValueClass(IntWritable.class);

    conf.setMapperClass(MapClass.class);
    conf.setReducerClass(Reduce.class);

    FileInputFormat.addInputPath(conf, new Path(inputPath));
    FileOutputFormat.setOutputPath(conf, new Path(outputPath));

    JobClient.runJob(conf);
  }

この方法はHadoop The Definitive Guide 2012、Oreilly の本に記載されています。

public static void main(String[] args) throws Exception {
  if (args.length != 2) {
    System.err.println("Usage: MaxTemperature <input path> <output path>");
    System.exit(-1);
  }
  Job job = new Job();
  job.setJarByClass(MaxTemperature.class);
  job.setJobName("Max temperature");
  FileInputFormat.addInputPath(job, new Path(args[0]));
  FileOutputFormat.setOutputPath(job, new Path(args[1]));
  job.setMapperClass(MaxTemperatureMapper.class);
  job.setReducerClass(MaxTemperatureReducer.class);
  job.setOutputKeyClass(Text.class);
  job.setOutputValueClass(IntWritable.class);
  System.exit(job.waitForCompletion(true) ? 0 : 1);
}

Oreilly の本に記載されているプログラムを試しているときに、Jobクラスのコンストラクターが非推奨であることがわかりました。Oreilly の本は Hadoop 2 (yarn) に基づいているため、非推奨のクラスが使用されていることに驚きました。

みんなが使っている方法を教えてください。

score 5 · Accepted Answer

run() メソッドをオーバーライドすると、-D、-libjars、-files などの Hadoop jar オプションを使用できます。これらはすべて、ほとんどすべての Hadoop プロジェクトで非常に必要です。main() メソッドを介してそれらを使用できるかどうかはわかりません。

hadoop - Hadoop プログラムのドライバーを作成する複数の方法 - どれを選択しますか?

2 に答える 2

Related

Reference