hadoop - 次に Hadoop プログラムを処理できないのはなぜですか?

Question

みんな！Eclipse の Hadoop に関するプログラムがあり、ソースコードは次のとおりです。

public class TokenizerMapper extends Mapper<Object, Text, Text, IntWritable> {
        private final static IntWritable one = new IntWritable(1);
    private Text word = new Text();
    protected void map(Object key, Text value, Context context) throws IOException, InterruptedException {
        StringTokenizer itr = new StringTokenizer(value.toString());
        while(itr.hasMoreTokens()) {
            word.set(itr.nextToken());
            context.write(word, one);
        }
    }
}

public class IntSumReducer extends Reducer<Text, IntWritable, Text, IntWritable> {
    private IntWritable result = new IntWritable();
    @Override
    protected void reduce(Text key, Iterable<IntWritable> values,
            Context context) throws IOException, InterruptedException {
        int sum = 0;
        for(IntWritable val : values) {
            sum += val.get();
        }
        result.set(sum);
        context.write(key, result);
    }
}

public class WordCount {
    public static void main(String[] args) throws Exception {
        Configuration conf = new Configuration();
        String[] oargs = new GenericOptionsParser(conf, args).getRemainingArgs();
        if(oargs.length != 2) {
            System.err.println("Usage: word count <in> <out>");
        }
        System.out.println("input:  "+oargs[0]);
        System.out.println("output: "+oargs[1]);
        Job job = new Job(conf, "word count");
        job.setJarByClass(WordCount.class);
        job.setMapperClass(TokenizerMapper.class);
        job.setCombinerClass(IntSumReducer.class);
        job.setReducerClass(IntSumReducer.class);
        job.setOutputKeyClass(Text.class);
        job.setOutputValueClass(IntWritable.class);
        FileInputFormat.addInputPath(job, new Path(oargs[0]));
        FileOutputFormat.setOutputPath(job, new Path(oargs[1]));
        System.out.println("==============================");
        System.out.println("start ...");
        boolean flag = job.waitForCompletion(true);
            System.out.println(flag);
        System.out.println("end ...");
        System.out.println("==============================");
    }
}

結果は次のとおりです。ログを参照してください。

rory@0303 /cygdrive/f/develop/hadoop/hadoop-1.0.3
$ ./bin/hadoop jar ./jar/wordcount.jar /tmp/input /tmp/output
input:  /tmp/input
output: /tmp/output
==============================
start ...
12/07/25 14:59:17 INFO input.FileInputFormat: Total input paths to process : 2
12/07/25 14:59:17 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
12/07/25 14:59:17 WARN snappy.LoadSnappy: Snappy native library not loaded
12/07/25 14:59:17 INFO mapred.JobClient: Running job: job_201207251447_0001
12/07/25 14:59:18 INFO mapred.JobClient:  map 0% reduce 0%

ログは継続せず、そこで永遠に停止します。なぜ？

Windows XP システムの cygwin ソフトウェアを使用して、ローカルモードでコードを実行しています。

score 0 · Accepted Answer

@ Roryは、Thomasが尋ねたように、「次にやる」についてもっと具体的に教えていただけますか？これは、画面に表示されたスタックトレース全体ですか？一度コンパイルしてから結果が出て、再度実行できないということですか？正しい入力引数、つまりEclipse IDEのプログラムの入力ディレクトリと出力ディレクトリを指定しましたか？

プログラムを2回目に実行できない場合は、別の出力ディレクトリーを指定していない可能性があります。しかし、スタックトレースを見た後は、そうではないと思います。

score 0 · Accepted Answer

printlnの部分が表示されない理由を尋ねている場合はend ====================、コードを確認してください。

System.exit(job.waitForCompletion(true)?0:1);
System.out.println("end ...");
System.out.println("==============================");

job.waitForCompletion(true)呼び出しをでラップしているSystem.exitため、最後の2つのSystem.outが実行される前にJVMが終了します。

編集

ここでのログアペンダー/ロガーメッセージは、他の例外がおそらく飲み込まれているという手がかりです。ToolRunnerユーティリティを利用するには、コードの署名を修正する必要があります。

public class WordCount {
  public static void main(String[] args) throws Exception {
    ToolRunner.run(new WordCount(), args);  
  }

  public int run(String args[]) {
    if(args.length != 2) {
        System.err.println("Usage: word count <in> <out>");
    }
    System.out.println("input:  "+args[0]);
    System.out.println("output: "+args[1]);
    Job job = new Job(getConf(), "word count");
    Configuration conf = job.getConf();

    job.setJarByClass(WordCount.class);
    job.setMapperClass(TokenizerMapper.class);
    job.setCombinerClass(IntSumReducer.class);
    job.setReducerClass(IntSumReducer.class);
    job.setOutputKeyClass(Text.class);
    job.setOutputValueClass(IntWritable.class);

    FileInputFormat.addInputPath(job, new Path(args[0]));
    FileOutputFormat.setOutputPath(job, new Path(args[1]));

    System.out.println("==============================");
    System.out.println("start ...");
    int result = job.waitForCompletion(true) ? 0 : 1;
    System.out.println("end ...");
    System.out.println("==============================");

    return results
  }
}

また、$ HADOOP_HOME / bin / hadoopスクリプトを使用して、ジョブをクラスターに送信する必要があります（次のように、jarの名前とWordCountクラスの完全修飾名に置き換える必要があります）。

#> hadoop jar wordcount.jar WordCount input output

hadoop - 次に Hadoop プログラムを処理できないのはなぜですか?

2 に答える 2

Related

Reference