5

I am trying to run the shortest paths example from the giraph incubator (https://cwiki.apache.org/confluence/display/GIRAPH/Shortest+Paths+Example). However instead of executing the example from the giraph-*-dependencies.jar, I have created my own job jar. When I created a single Job file as presented in the example, I was getting

java.lang.RuntimeException: java.lang.RuntimeException: java.lang.ClassNotFoundException: org.test.giraph.Test$SimpleShortestPathsVertexInputFormat

Then I have moved the inner classes (SimpleShortestPathsVertexInputFormat and SimpleShortestPathsVertexOutputFormat) to separates files and renamed them just in case (SimpleShortestPathsVertexInputFormat_v2, SimpleShortestPathsVertexOutputFormat_v2); the classes are not static anymore. This have solved the issues of class not found for the SimpleShortestPathsVertexInputFormat_v2, however I am still getting the same error for the SimpleShortestPathsVertexOutputFormat_v2. Below is my stack trace.

INFO mapred.JobClient: Running job: job_201205221101_0003
INFO mapred.JobClient:  map 0% reduce 0%
INFO mapred.JobClient: Task Id : attempt_201205221101_0003_m_000005_0, Status : FAILED
    java.lang.RuntimeException: java.lang.RuntimeException: java.lang.ClassNotFoundException: org.test.giraph.utils.SimpleShortestPathsVertexOutputFormat_v2
            at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:898)
            at org.apache.giraph.graph.BspUtils.getVertexOutputFormatClass(BspUtils.java:134)
            at org.apache.giraph.bsp.BspOutputFormat.getOutputCommitter(BspOutputFormat.java:56)
            at org.apache.hadoop.mapred.Task.initialize(Task.java:490)
            at org.apache.hadoop.mapred.MapTask.run(MapTask.java:352)
            at org.apache.hadoop.mapred.Child$4.run(Child.java:259)
            at java.security.AccessController.doPrivileged(Native Method)
            at javax.security.auth.Subject.doAs(Subject.java:415)
            at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1059)
            at org.apache.hadoop.mapred.Child.main(Child.java:253)
    Caused by: java.lang.RuntimeException: java.lang.ClassNotFoundException: org.test.giraph.utils.SimpleShortestPathsVertexOutputFormat_v2
            at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:866)
            at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:890)
            ... 9 more

I have inspected my job jar and all classes are there. Furthermore I am using hadoop 0.20.203 in a pseudo distributed mode. The way I launch my job is presented below.

hadoop jar giraphJobs.jar org.test.giraph.Test -libjars /path/to/giraph-0.2-SNAPSHOT-jar-with-dependencies.jar /path/to/input /path/to/output 0 3

Also I have defined HADOOP_CLASSPATH for the giraph-*-dependencies.jar. I can run the PageRankBenchmark example without a problem (directly from the giraph-*-dependencies.jar), and the shortes path example works as well (also directly from the giraph-*-dependencies.jar). Other hadoop jobs work without a problem (somewhere I have read to test if my "cluster" works correctly). Does anyone came across similar problem? Any help will be appreciated.


Solution (sorry to post it like this but I can't answer my own question for a couple of more hours)

To solve this issue I had to add my Job jar to the -libjars (no changes to HADOOP_CLASSPATH where made). The command to launch job now looks like this.

hadoop jar giraphJobs.jar org.test.giraph.Test -libjars /path/to/giraph-0.2-SNAPSHOT-jar-with-dependencies.jar,/path/to/job.jar /path/to/input /path/to/output 0 3

List of jars has to be comma separated. Though this has solved my problem. I am still curious why I have to pass my job jar as a "classpath" parameter? Can someone explain me what is the rational behind this? As I found it strange (to say the least) to invoke my job jar and then pass it again as a "classpath" jar. I am really curious about the explanation.

4

1 に答える 1

4

この問題に対する代替のプログラムによる解決策を見つけました。run() メソッドを次のように変更する必要があります -

...
@Override
public int run(String[] argArray) throws Exception {
    Preconditions.checkArgument(argArray.length == 4,
        "run: Must have 4 arguments <input path> <output path> " +
        "<source vertex id> <# of workers>");

    GiraphJob job = new GiraphJob(getConf(), getClass().getName());
    // This is the addition - it will make hadoop look for other classes in the same     jar that contains this class
    job.getInternalJob().setJarByClass(getClass());
    job.setVertexClass(getClass());
    ...
}

setJarByClass() は、getClass() によって返されたクラスを含む同じ jar 内で欠落しているクラスを Hadoop が検索するようにします。ジョブ jar 名を -libjars オプションに個別に追加する必要はありません。

于 2012-08-02T03:25:37.600 に答える