スーパーステップを 20 に設定すると、うまく機能します。しかし、スーパーステップを 200 に設定すると、機能しません。
hadoop jar Test-jar-with-dependencies.jar org.apache.giraph.GiraphRunner test.Test -mc test.TestMC -vif org.apache.giraph.io.formats.JsonLongDoubleFloatDoubleVertexInputFormat -vip /input/test.txt -w 1 -ca mapred.job.tracker=s1 -ca mapreduce.job.counters.limit=1000
最終結果は次のとおりです。
16/10/20 08:56:08 INFO job.GiraphJob: Waiting for resources... Job will start only when it gets all 2 mappers
16/10/20 08:56:38 INFO job.HaltApplicationUtils$DefaultHaltInstructionsWriter: writeHaltInstructions: To halt after next superstep execute: 'bin/halt-application --zkServer s3:22181 --zkNode /_hadoopBsp/job_1476868823433_0017/_haltComputation'
16/10/20 08:56:38 INFO mapreduce.Job: Running job: job_1476868823433_0017
16/10/20 08:56:39 INFO mapreduce.Job: Job job_1476868823433_0017 running in uber mode : false
16/10/20 08:56:39 INFO mapreduce.Job: map 50% reduce 0%
16/10/20 08:56:47 INFO mapreduce.Job: map 100% reduce 0%
16/10/20 08:56:47 INFO mapreduce.Job: Job job_1476868823433_0017 failed with state FAILED due to: Task failed task_1476868823433_0017_m_000000
Job failed as tasks failed. failedMaps:1 failedReduces:0
16/10/20 08:56:47 INFO mapreduce.Job: Counters: 34
File System Counters
FILE: Number of bytes read=0
FILE: Number of bytes written=97529
FILE: Number of read operations=0
FILE: Number of large read operations=0
FILE: Number of write operations=0
HDFS: Number of bytes read=76
HDFS: Number of bytes written=0
HDFS: Number of read operations=8
HDFS: Number of large read operations=0
HDFS: Number of write operations=4
Job Counters
Failed map tasks=1
Launched map tasks=2
Other local map tasks=2
Total time spent by all maps in occupied slots (ms)=33269
Total time spent by all reduces in occupied slots (ms)=0
Total time spent by all map tasks (ms)=33269
Total vcore-seconds taken by all map tasks=33269
Total megabyte-seconds taken by all map tasks=34067456
Map-Reduce Framework
Map input records=1
Map output records=0
Input split bytes=44
Spilled Records=0
Failed Shuffles=0
Merged Map outputs=0
GC time elapsed (ms)=130
CPU time spent (ms)=7280
Physical memory (bytes) snapshot=186077184
Virtual memory (bytes) snapshot=823398400
Total committed heap usage (bytes)=200802304
Zookeeper base path
/_hadoopBsp/job_1476868823433_0017=0
Zookeeper halt node
/_hadoopBsp/job_1476868823433_0017/_haltComputation=0
Zookeeper server:port
s3:22181=0
File Input Format Counters
Bytes Read=0
File Output Format Counters
Bytes Written=0
私のテストコードは次のとおりです。頂点計算
public class Test extends BasicComputation<LongWritable, DoubleWritable, FloatWritable, DoubleWritable>{
@Override
public void compute(
Vertex<LongWritable, DoubleWritable, FloatWritable> vertex,
Iterable<DoubleWritable> messages) throws IOException {
// TODO Auto-generated method stub
}
}
マスター計算
public class TestMC extends DefaultMasterCompute {
@Override
public void compute() {
// TODO Auto-generated method
if (getSuperstep() == 200) {
haltComputation();
}
}
}
カウンターが小さすぎる(120)ようですが、1000に設定しました。この問題を解決するにはどうすればよいですか?
エラーログは次のとおりです。
2016-10-20 08:56:38,569 INFO [org.apache.giraph.master.MasterThread] org.apache.giraph.master.MasterThread: masterThread: Coordination of superstep 199 took 0.016 seconds ended with state ALL_SUPERSTEPS_DONE and is now on superstep 200
2016-10-20 08:56:38,573 INFO [org.apache.giraph.master.MasterThread] org.apache.giraph.master.BspServiceMaster: setJobState: {"_stateKey":"FINISHED","_applicationAttemptKey":-1,"_superstepKey":-1} on superstep 200
2016-10-20 08:56:38,574 INFO [org.apache.giraph.master.MasterThread] org.apache.giraph.master.BspServiceMaster: setJobState: {"_stateKey":"FINISHED","_applicationAttemptKey":-1,"_superstepKey":-1}
2016-10-20 08:56:38,574 INFO [ProcessThread(sid:0 cport:-1):] org.apache.zookeeper.server.PrepRequestProcessor: Got user-level KeeperException when processing sessionid:0x157df96ea710000 type:create cxid:0x236f zxid:0x143b txntype:-1 reqpath:n/a Error Path:/_hadoopBsp/job_1476868823433_0017/_cleanedUpDir Error:KeeperErrorCode = NoNode for /_hadoopBsp/job_1476868823433_0017/_cleanedUpDir
2016-10-20 08:56:38,574 INFO [ProcessThread(sid:0 cport:-1):] org.apache.zookeeper.server.PrepRequestProcessor: Got user-level KeeperException when processing sessionid:0x157df96ea710001 type:create cxid:0xd8f zxid:0x143c txntype:-1 reqpath:n/a Error Path:/_hadoopBsp/job_1476868823433_0017/_masterJobState Error:KeeperErrorCode = NodeExists for /_hadoopBsp/job_1476868823433_0017/_masterJobState
2016-10-20 08:56:38,575 INFO [org.apache.giraph.master.MasterThread] org.apache.giraph.master.BspServiceMaster: cleanup: Notifying master its okay to cleanup with /_hadoopBsp/job_1476868823433_0017/_cleanedUpDir/0_master
2016-10-20 08:56:38,575 INFO [ProcessThread(sid:0 cport:-1):] org.apache.zookeeper.server.PrepRequestProcessor: Got user-level KeeperException when processing sessionid:0x157df96ea710000 type:create cxid:0x2375 zxid:0x143f txntype:-1 reqpath:n/a Error Path:/_hadoopBsp/job_1476868823433_0017/_cleanedUpDir Error:KeeperErrorCode = NodeExists for /_hadoopBsp/job_1476868823433_0017/_cleanedUpDir
2016-10-20 08:56:38,575 INFO [org.apache.giraph.master.MasterThread] org.apache.giraph.master.BspServiceMaster: cleanUpZooKeeper: Node /_hadoopBsp/job_1476868823433_0017/_cleanedUpDir already exists, no need to create.
2016-10-20 08:56:38,576 INFO [org.apache.giraph.master.MasterThread] org.apache.giraph.master.BspServiceMaster: cleanUpZooKeeper: Got 1 of 2 desired children from /_hadoopBsp/job_1476868823433_0017/_cleanedUpDir
2016-10-20 08:56:38,576 INFO [org.apache.giraph.master.MasterThread] org.apache.giraph.master.BspServiceMaster: cleanedUpZooKeeper: Waiting for the children of /_hadoopBsp/job_1476868823433_0017/_cleanedUpDir to change since only got 1 nodes.
2016-10-20 08:56:40,710 INFO [communication thread] org.apache.hadoop.mapred.Task: Communication exception: org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.ipc.RpcServerException): IPC server unable to read call parameters: Too many counters: 121 max=120
at org.apache.hadoop.ipc.Client.call(Client.java:1411)
at org.apache.hadoop.ipc.Client.call(Client.java:1364)
at org.apache.hadoop.ipc.WritableRpcEngine$Invoker.invoke(WritableRpcEngine.java:231)
at com.sun.proxy.$Proxy7.statusUpdate(Unknown Source)
at org.apache.hadoop.mapred.Task$TaskReporter.run(Task.java:737)
at java.lang.Thread.run(Thread.java:745)
2016-10-20 08:56:40,879 INFO [main-EventThread] org.apache.giraph.bsp.BspService: process: cleanedUpChildrenChanged signaled
2016-10-20 08:56:40,880 INFO [org.apache.giraph.master.MasterThread] org.apache.giraph.master.BspServiceMaster: cleanUpZooKeeper: Got 2 of 2 desired children from /_hadoopBsp/job_1476868823433_0017/_cleanedUpDir
2016-10-20 08:56:40,880 INFO [ProcessThread(sid:0 cport:-1):] org.apache.zookeeper.server.PrepRequestProcessor: Processed session termination for sessionid: 0x157df96ea710001
2016-10-20 08:56:40,882 INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:22181] org.apache.zookeeper.server.NIOServerCnxn: Closed socket connection for client /219.223.239.57:49390 which had sessionid 0x157df96ea710001
2016-10-20 08:56:40,888 INFO [org.apache.giraph.master.MasterThread] org.apache.giraph.master.BspServiceMaster: cleanup: Removed HDFS checkpoint directory (_bsp/_checkpoints//job_1476868823433_0017) with return = false since the job Giraph: cost.Test succeeded
2016-10-20 08:56:40,888 INFO [org.apache.giraph.master.MasterThread] org.apache.giraph.comm.netty.NettyClient: stop: Halting netty client
2016-10-20 08:56:40,890 INFO [netty-client-worker-0] org.apache.giraph.comm.netty.NettyClient: stop: reached wait threshold, 1 connections closed, releasing resources now.
2016-10-20 08:56:43,095 INFO [org.apache.giraph.master.MasterThread] org.apache.giraph.comm.netty.NettyClient: stop: Netty client halted
2016-10-20 08:56:43,095 INFO [org.apache.giraph.master.MasterThread] org.apache.giraph.comm.netty.NettyServer: stop: Halting netty server
2016-10-20 08:56:43,106 INFO [org.apache.giraph.master.MasterThread] org.apache.giraph.comm.netty.NettyServer: stop: Start releasing resources
2016-10-20 08:56:43,780 INFO [communication thread] org.apache.hadoop.mapred.Task: Communication exception: org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.ipc.RpcServerException): IPC server unable to read call parameters: Too many counters: 121 max=120
at org.apache.hadoop.ipc.Client.call(Client.java:1411)
at org.apache.hadoop.ipc.Client.call(Client.java:1364)
at org.apache.hadoop.ipc.WritableRpcEngine$Invoker.invoke(WritableRpcEngine.java:231)
at com.sun.proxy.$Proxy7.statusUpdate(Unknown Source)
at org.apache.hadoop.mapred.Task$TaskReporter.run(Task.java:737)
at java.lang.Thread.run(Thread.java:745)
2016-10-20 08:56:43,793 INFO [communication thread] org.apache.hadoop.mapred.Task: Process Thread Dump: Communication exception
46 active threads
Thread 56 (netty-server-worker-15):
State: RUNNABLE
Blocked count: 0
Waited count: 1
Stack:
sun.nio.ch.EPollArrayWrapper.epollWait(Native Method)
sun.nio.ch.EPollArrayWrapper.poll(EPollArrayWrapper.java:269)
sun.nio.ch.EPollSelectorImpl.doSelect(EPollSelectorImpl.java:79)
sun.nio.ch.SelectorImpl.lockAndDoSelect(SelectorImpl.java:87)
sun.nio.ch.SelectorImpl.select(SelectorImpl.java:98)
io.netty.channel.nio.NioEventLoop.select(NioEventLoop.java:596)
io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:306)
io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:101)
java.lang.Thread.run(Thread.java:745)
Thread 55 (netty-server-worker-14):
State: RUNNABLE
Blocked count: 0
Waited count: 1