python - Hadoop ストリーミングが出力でハングする: /path../output

Question

こんにちは、Hadoop ストリーミングのマッパーとリデューサーとして、Python で 2 つのスクリプトを作成しました。コードを実行すると、マッピングと縮小が両方とも 100% 正常に終了しました。しかし、プロセスの最後にハングアップしました。

出力は次のようになります。

...
13/10/07 17:25:16 INFO streaming.StreamJob:  map 99%  reduce 30%
13/10/07 17:26:18 INFO streaming.StreamJob:  map 99%  reduce 31%
13/10/07 17:26:55 INFO streaming.StreamJob:  map 99%  reduce 32%
13/10/07 17:28:16 INFO streaming.StreamJob:  map 100%  reduce 32%
13/10/07 17:29:08 INFO streaming.StreamJob:  map 100%  reduce 33%
13/10/07 17:30:55 INFO streaming.StreamJob:  map 100%  reduce 39%
13/10/07 17:30:56 INFO streaming.StreamJob:  map 100%  reduce 46%
13/10/07 17:30:57 INFO streaming.StreamJob:  map 100%  reduce 52%
13/10/07 17:30:58 INFO streaming.StreamJob:  map 100%  reduce 72%
13/10/07 17:31:00 INFO streaming.StreamJob:  map 100%  reduce 74%
13/10/07 17:31:01 INFO streaming.StreamJob:  map 100%  reduce 89%
13/10/07 17:31:02 INFO streaming.StreamJob:  map 100%  reduce 98%
13/10/07 17:31:03 INFO streaming.StreamJob:  map 100%  reduce 99%
13/10/07 17:31:57 INFO streaming.StreamJob:  map 100%  reduce 100%
13/10/07 17:32:00 INFO streaming.StreamJob: Job complete: job_201309301959_0100
13/10/07 17:32:00 INFO streaming.StreamJob: Output: /tmp/binwang_31

私たちのクラスターは神経節によって監視されており、すべてのノードが正常に戻り、重い計算を行っていないことがはっきりとわかります。その間、私は hdfs に行き、出力がそこに座っているのを見つけることができます。（完全かどうかは不明）。私には、マップ全体の縮小が正常に終了したように見えますが、端末は最後のステップで 10 分以上ハングします...

これがどのように発生するのか疑問に思っています.CTRL + Zを押して停止するか、さらに数分待つ必要があります. output:... ステップにそれほど時間がかかるかどうかは誰にもわかりますか？そうでない場合、その理由は何ですか？

ここに画像の説明を入力

別のセッションを開いてコマンドを実行したときの応答は次のとおりです

$ /usr/bin/hadoop job -status job_201309301959_0100
DEPRECATED: Use of this script to execute mapred command is deprecated.
Instead use the mapred command for it.


Job: job_201309301959_0100
file: hdfs://url1:8020/user/user1/.staging/job_201309301959_0100/job.xml
tracking URL: http://url1:50030/jobdetails.jsp?jobid=job_201309301959_0100
map() completion: 1.0
reduce() completion: 1.0
Counters: 34
    File System Counters
            FILE: Number of bytes read=232427562
            FILE: Number of bytes written=835363817
            FILE: Number of read operations=0
            FILE: Number of large read operations=0
            FILE: Number of write operations=0
            HDFS: Number of bytes read=107873895369
            HDFS: Number of bytes written=51760077
            HDFS: Number of read operations=1722
            HDFS: Number of large read operations=0
            HDFS: Number of write operations=144
    Job Counters
            Launched map tasks=803
            Launched reduce tasks=72
            Data-local map tasks=731
            Rack-local map tasks=72
            Total time spent by all maps in occupied slots (ms)=521490905
            Total time spent by all reduces in occupied slots (ms)=47701745
            Total time spent by all maps waiting after reserving slots (ms)=0
            Total time spent by all reduces waiting after reserving slots (ms)=0
    Map-Reduce Framework
            Map input records=425093
            Map output records=10311822
            Map output bytes=906412336
            Input split bytes=111617
            Combine input records=0
            Combine output records=0
            Reduce input groups=550636
            Reduce shuffle bytes=452246236
            Reduce input records=10311822
            Reduce output records=550636
            Spilled Records=20623644
            CPU time spent (ms)=479770510
            Physical memory (bytes) snapshot=533152505856
            Virtual memory (bytes) snapshot=1439405166592
            Total committed heap usage (bytes)=844896337920
    org.apache.hadoop.mapreduce.lib.input.FileInputFormatCounter
            BYTES_READ=107742318536

前もって感謝します。

python - Hadoop ストリーミングが出力でハングする: /path../output

0 に答える 0

Related

Reference