0

csv ファイルは、ユーザー設定 (userid、itemid) のブール データで構成されます。ファイルはプリプロセッサによって不整合がないかチェックされます。手動でも確認しましたが、データは一貫して正しい形式であるようです。注意すべき 2 つの点: - Hadoop ジョブへの入力ファイルが 1 つしかない場合、つまり、すべての設定が単一の csv にエクスポートされ、(userid,itemid) のエントリが重複していない場合、ジョブは決して失敗しません。 - ジョブはランダムに失敗します。 Hadoop ディレクトリには複数の csv ファイルがあります。これは、ユーザー設定の初期ダンプと、ユーザー設定の毎日のデルタ ファイルです。

csv データが一貫していて全体的に正しい場合、ジョブは ArrayIndexOutOfBounds 例外で失敗することはありません。デルタ ファイル全体で (userid,itemid) のエントリが重複している場合、ジョブが失敗する可能性はありますか。これらのエントリの多くは、ブール値の設定により、複数のデルタ ファイルで重複しています。

ログには、エラーの原因となったデータの一部が出力されていないようです。ログは次のとおりです。

2012-08-09 15:03:22,652 INFO org.apache.hadoop.mapred.JobInProgress: job_201208021510_0221: nMaps=2 nReduces=1 max=-1
2012-08-09 15:03:22,652 INFO org.apache.hadoop.mapred.JobTracker: Job job_201208021510_0221 added successfully for user 'deploy' to queue 'default'
2012-08-09 15:03:22,652 INFO org.apache.hadoop.mapred.AuditLogger: USER=deploy  IP=127.0.0.1    OPERATION=SUBMIT_JOB    TARGET=job_201208021510_0221    RESULT=SUCCESS
2012-08-09 15:03:22,652 INFO org.apache.hadoop.mapred.JobTracker: Initializing job_201208021510_0221
2012-08-09 15:03:22,653 INFO org.apache.hadoop.mapred.JobInProgress: Initializing job_201208021510_0221
2012-08-09 15:03:23,023 INFO org.apache.hadoop.mapred.JobInProgress: jobToken generated and stored with users keys in /zenius/hadoop/tmp/mapred/system/job_201208021510_0221/jobToken
2012-08-09 15:03:23,027 INFO org.apache.hadoop.mapred.JobInProgress: Input size for job job_201208021510_0221 = 56518256. Number of splits = 2
2012-08-09 15:03:23,027 INFO org.apache.hadoop.mapred.JobInProgress: tip:task_201208021510_0221_m_000000 has split on node:/default-rack/localhost
2012-08-09 15:03:23,028 INFO org.apache.hadoop.mapred.JobInProgress: tip:task_201208021510_0221_m_000001 has split on node:/default-rack/localhost
2012-08-09 15:03:23,028 INFO org.apache.hadoop.mapred.JobInProgress: job_201208021510_0221 LOCALITY_WAIT_FACTOR=1.0
2012-08-09 15:03:23,028 INFO org.apache.hadoop.mapred.JobInProgress: Job job_201208021510_0221 initialized successfully with 2 map tasks and 1 reduce tasks.
2012-08-09 15:03:25,787 INFO org.apache.hadoop.mapred.JobTracker: Adding task (JOB_SETUP) 'attempt_201208021510_0221_m_000003_0' to tip task_201208021510_0221_m_000003, for tracker 'tracker_localhost:localhost/127.0.0.1:50158'
2012-08-09 15:03:31,794 INFO org.apache.hadoop.mapred.JobInProgress: Task 'attempt_201208021510_0221_m_000003_0' has completed task_201208021510_0221_m_000003 successfully.
2012-08-09 15:03:31,795 INFO org.apache.hadoop.mapred.JobTracker: Adding task (MAP) 'attempt_201208021510_0221_m_000000_0' to tip task_201208021510_0221_m_000000, for tracker 'tracker_localhost:localhost/127.0.0.1:50158'
2012-08-09 15:03:31,796 INFO org.apache.hadoop.mapred.JobInProgress: Choosing data-local task task_201208021510_0221_m_000000
2012-08-09 15:03:31,796 INFO org.apache.hadoop.mapred.JobTracker: Adding task (MAP) 'attempt_201208021510_0221_m_000001_0' to tip task_201208021510_0221_m_000001, for tracker 'tracker_localhost:localhost/127.0.0.1:50158'
2012-08-09 15:03:31,796 INFO org.apache.hadoop.mapred.JobInProgress: Choosing data-local task task_201208021510_0221_m_000001
2012-08-09 15:03:37,800 INFO org.apache.hadoop.mapred.JobInProgress: Task 'attempt_201208021510_0221_m_000001_0' has completed task_201208021510_0221_m_000001 successfully.
2012-08-09 15:03:37,801 INFO org.apache.hadoop.mapred.JobTracker: Adding task (REDUCE) 'attempt_201208021510_0221_r_000000_0' to tip task_201208021510_0221_r_000000, for tracker 'tracker_localhost:localhost/127.0.0.1:50158'
2012-08-09 15:03:49,807 INFO org.apache.hadoop.mapred.TaskInProgress: Error from attempt_201208021510_0221_m_000000_0: java.lang.ArrayIndexOutOfBoundsException: 1
at org.apache.mahout.cf.taste.hadoop.item.ItemIDIndexMapper.map(ItemIDIndexMapper.java:47)
at org.apache.mahout.cf.taste.hadoop.item.ItemIDIndexMapper.map(ItemIDIndexMapper.java:31)
at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:764)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:370)
at org.apache.hadoop.mapred.Child$4.run(Child.java:255)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:396)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1093)
at org.apache.hadoop.mapred.Child.main(Child.java:249)

2012-08-09 15:03:52,810 INFO org.apache.hadoop.mapred.JobInProgress: Choosing a failed task task_201208021510_0221_m_000000
2012-08-09 15:03:52,810 INFO org.apache.hadoop.mapred.JobTracker: Adding task (MAP) 'attempt_201208021510_0221_m_000000_1' to tip task_201208021510_0221_m_000000, for tracker 'tracker_localhost:localhost/127.0.0.1:50158'
2012-08-09 15:03:52,810 INFO org.apache.hadoop.mapred.JobInProgress: Choosing data-local task task_201208021510_0221_m_000000
2012-08-09 15:03:52,810 INFO org.apache.hadoop.mapred.JobTracker: Removing task 'attempt_201208021510_0221_m_000000_0'
2012-08-09 15:04:14,603 INFO org.apache.hadoop.mapred.TaskInProgress: Error from attempt_201208021510_0221_m_000000_1: java.lang.ArrayIndexOutOfBoundsException: 1
at org.apache.mahout.cf.taste.hadoop.item.ItemIDIndexMapper.map(ItemIDIndexMapper.java:47)
at org.apache.mahout.cf.taste.hadoop.item.ItemIDIndexMapper.map(ItemIDIndexMapper.java:31)
at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:764)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:370)
at org.apache.hadoop.mapred.Child$4.run(Child.java:255)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:396)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1093)
at org.apache.hadoop.mapred.Child.main(Child.java:249)

2012-08-09 15:04:17,606 INFO org.apache.hadoop.mapred.JobInProgress: Choosing a failed task task_201208021510_0221_m_000000
2012-08-09 15:04:17,607 INFO org.apache.hadoop.mapred.JobTracker: Adding task (MAP) 'attempt_201208021510_0221_m_000000_2' to tip task_201208021510_0221_m_000000, for tracker 'tracker_localhost:localhost/127.0.0.1:50158'
2012-08-09 15:04:17,607 INFO org.apache.hadoop.mapred.JobInProgress: Choosing data-local task task_201208021510_0221_m_000000
2012-08-09 15:04:17,607 INFO org.apache.hadoop.mapred.JobTracker: Removing task 'attempt_201208021510_0221_m_000000_1'
2012-08-09 15:04:35,618 INFO org.apache.hadoop.mapred.TaskInProgress: Error from attempt_201208021510_0221_m_000000_2: java.lang.ArrayIndexOutOfBoundsException: 1
at org.apache.mahout.cf.taste.hadoop.item.ItemIDIndexMapper.map(ItemIDIndexMapper.java:47)
at org.apache.mahout.cf.taste.hadoop.item.ItemIDIndexMapper.map(ItemIDIndexMapper.java:31)
at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:764)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:370)
at org.apache.hadoop.mapred.Child$4.run(Child.java:255)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:396)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1093)
at org.apache.hadoop.mapred.Child.main(Child.java:249)

2012-08-09 15:04:38,621 INFO org.apache.hadoop.mapred.JobInProgress: Choosing a failed task task_201208021510_0221_m_000000
2012-08-09 15:04:38,621 INFO org.apache.hadoop.mapred.JobTracker: Adding task (MAP) 'attempt_201208021510_0221_m_000000_3' to tip task_201208021510_0221_m_000000, for tracker 'tracker_localhost:localhost/127.0.0.1:50158'
2012-08-09 15:04:38,621 INFO org.apache.hadoop.mapred.JobInProgress: Choosing data-local task task_201208021510_0221_m_000000
2012-08-09 15:04:38,621 INFO org.apache.hadoop.mapred.JobTracker: Removing task 'attempt_201208021510_0221_m_000000_2'
2012-08-09 15:04:56,632 INFO org.apache.hadoop.mapred.TaskInProgress: Error from attempt_201208021510_0221_m_000000_3: java.lang.ArrayIndexOutOfBoundsException: 1
at org.apache.mahout.cf.taste.hadoop.item.ItemIDIndexMapper.map(ItemIDIndexMapper.java:47)
at org.apache.mahout.cf.taste.hadoop.item.ItemIDIndexMapper.map(ItemIDIndexMapper.java:31)
at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:764)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:370)
at org.apache.hadoop.mapred.Child$4.run(Child.java:255)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:396)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1093)
at org.apache.hadoop.mapred.Child.main(Child.java:249)

2012-08-09 15:04:59,635 INFO org.apache.hadoop.mapred.TaskInProgress: TaskInProgress task_201208021510_0221_m_000000 has failed 4 times.
2012-08-09 15:04:59,635 INFO org.apache.hadoop.mapred.JobInProgress: TaskTracker at 'localhost' turned 'flaky'
2012-08-09 15:04:59,635 INFO org.apache.hadoop.mapred.JobInProgress: Aborting job job_201208021510_0221
2012-08-09 15:04:59,635 INFO org.apache.hadoop.mapred.JobInProgress: Killing job 'job_201208021510_0221'
2012-08-09 15:04:59,635 INFO org.apache.hadoop.mapred.JobTracker: Adding task (JOB_CLEANUP) 'attempt_201208021510_0221_m_000002_0' to tip...
4

1 に答える 1

0

いいえ、データのどこかに不良行があることは間違いありません。最も可能性の高い原因は、空白行、ヘッダー行、「コメント」行、または同じディレクトリ内の _SUCCESS のようなファイルです。

于 2012-08-09T14:53:49.883 に答える