java - 600 秒間、ステータスの報告に失敗しました。キリング！Hadoop での進行状況の報告

Question

次のエラーが表示されます。

Task attempt_201304161625_0028_m_000000_0 failed to report status for 600 seconds. Killing!

私の地図の仕事のために。この質問は、これ、これ、およびこれに似ています。ただし、hadoop が進行状況を報告しないタスクを強制終了するまでのデフォルト時間を増やしたくありません。つまり、

Configuration conf=new Configuration();
long milliSeconds = 1000*60*60;
conf.setLong("mapred.task.timeout", milliSeconds);

context.progress()代わりに、context.setStatus("Some Message")またはcontext.getCounter(SOME_ENUM.PROGRESS).increment(1)同様のものを使用して定期的に進捗状況を報告したいと考えています。ただし、これでもジョブが強制終了されます。進捗状況を報告しようとしているコードのスニペットを次に示します。マッパー:

protected void map(Key key, Value value, Context context) throws IOException, InterruptedException {

    //do some things
    Optimiser optimiser = new Optimiser();
    optimiser.optimiseFurther(<some parameters>, context);
    //more things
    context.write(newKey, newValue);
}

Optimiser クラス内の optimiseFurther メソッド:

public void optimiseFurther(<Some parameters>, TaskAttemptContext context) {

    int count = 0;
    while(something is true) {
        //optimise

        //try to report progress
        context.setStatus("Progressing:" + count);
        System.out.println("Optimise Progress:" + context.getStatus());
        context.progress();
        count++;
    }
}

マッパーからの出力は、ステータスが更新されていることを示しています。

Optimise Progress:Progressing:0
Optimise Progress:Progressing:1
Optimise Progress:Progressing:2
...

ただし、デフォルトの時間が経過しても、ジョブはまだ強制終了されています。コンテキストを間違った方法で使用していますか? 進捗状況を正常に報告するために、ジョブのセットアップで他に行う必要があることはありますか?

score 6 · Accepted Answer

発生する可能性があるのは、Context 内にある Reporter 自体でこれらの進行メソッドを呼び出す必要があり、コンテキスト自体で呼び出すことができない場合があるということです。

Cloudera から

進捗報告

タスクが 10 分間進行状況を報告しない場合 (mapred.task.timeout プロパティを参照)、そのタスクは Hadoop によって強制終了されます。ほとんどのタスクは、入力を読み取って出力を書き込むことによって暗黙的に進行状況を報告するため、このような状況に遭遇することはありません。ただし、この方法でレコードを処理しない一部のジョブは、この動作に違反し、タスクが強制終了される可能性があります。シミュレーションは良い例です。シミュレーションは各マップで CPU を大量に消費する処理を行い、通常は計算の最後に結果を書き込むだけだからです。進捗状況を定期的に (10 分ごとよりも頻繁に) 報告できるように作成する必要があります。これは、いくつかの方法で実現できます。

Call setStatus() on Reporter to set a human-readable description of
the task’s progress
Call incrCounter() on Reporter to increment a user counter
Call progress() on Reporter to tell Hadoop that your task is 
still there (and making progress)

Cloudera のヒント

public Context(Configuration conf, TaskAttemptID taskid,
               RecordReader<KEYIN,VALUEIN> reader,
               RecordWriter<KEYOUT,VALUEOUT> writer,
               OutputCommitter committer,
               StatusReporter reporter,
               InputSplit split)

java - 600 秒間、ステータスの報告に失敗しました。キリング！Hadoop での進行状況の報告

2 に答える 2

Related

Reference