Apache ログを分析しようとしていますが、目標は、すべてのユーザー エージェントとその使用率を調べることです。次のプログラムは、結果に各ユーザーエージェント、カウント、およびパーセンテージが含まれている場合に行に対して正常に機能します。プログラムは、最も使用されている順に並べ替えようとすると、最後の行で失敗します。誰か助けてくれませんか?
logs = LOAD '$LOGS' USING ApacheCombinedLogLoader AS (remoteHost, hyphen, user, time, method, uri, protocol, statusCode, responseSize, referer, userAgent);
uarows = FOREACH logs GENERATE userAgent;
total = FOREACH (GROUP uarows ALL) GENERATE COUNT(uarows) as count;
dump total;
gpuarows = GROUP uarows BY userAgent;
result = FOREACH gpuarows {
subtotal = COUNT(uarows);
GENERATE flatten(group) as ua, subtotal AS SUB_TOTAL, 100*(double)subtotal/(double)total.count AS percentage;
};
orderresult = ORDER result BY SUB_TOTAL DESC;
dump orderresult;
奇妙なのは、「結果のダンプ」が正常に機能することです。そのため、ORDER 行が問題を引き起こしています。
エラー:
013-04-13 11:33:09,976 [Thread-48] INFO org.apache.hadoop.mapred.MapTask - data buffer = 79691776/99614720
2013-04-13 11:33:09,976 [Thread-48] INFO org.apache.hadoop.mapred.MapTask - record buffer = 262144/327680
2013-04-13 11:33:09,995 [Thread-48] WARN org.apache.hadoop.mapred.LocalJobRunner - job_local_0005
java.lang.RuntimeException: org.apache.hadoop.mapreduce.lib.input.InvalidInputException: Input path does not exist: file:/home/dliu/ApacheLogAnalysisWithPig/pigsample_1573648613_1365823989735
at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.partitioners.WeightedRangePartitioner.setConf(WeightedRangePartitioner.java:157)
at org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:62)
at org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:117)
at org.apache.hadoop.mapred.MapTask$NewOutputCollector.<init>(MapTask.java:677)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:756)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:370)
at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:212)
Caused by: org.apache.hadoop.mapreduce.lib.input.InvalidInputException: Input path does not exist: file:/home/dliu/ApacheLogAnalysisWithPig/pigsample_1573648613_1365823989735
at org.apache.hadoop.mapreduce.lib.input.FileInputFormat.listStatus(FileInputFormat.java:235)
at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigFileInputFormat.listStatus(PigFileInputFormat.java:37)
at org.apache.hadoop.mapreduce.lib.input.FileInputFormat.getSplits(FileInputFormat.java:252)
at org.apache.pig.impl.io.ReadToEndLoader.init(ReadToEndLoader.java:177)
at org.apache.pig.impl.io.ReadToEndLoader.<init>(ReadToEndLoader.java:124)
at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.partitioners.WeightedRangePartitioner.setConf(WeightedRangePartitioner.java:131)
... 6 more
2013-04-13 11:33:10,276 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - HadoopJobId: job_local_0005
2013-04-13 11:33:10,276 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - Processing aliases orderresult
2013-04-13 11:33:10,276 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - detailed locations: M: orderresult[16,14] C: R:
2013-04-13 11:33:15,286 [main] WARN org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - Ooops! Some job has failed! Specify -stop_on_failure if you want Pig to stop immediately on failure.
2013-04-13 11:33:15,286 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - job job_local_0005 has failed! Stop running all dependent jobs
2013-04-13 11:33:15,287 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 100% complete
2013-04-13 11:33:15,287 [main] ERROR org.apache.pig.tools.pigstats.PigStatsUtil - 1 map reduce job(s) failed!
2013-04-13 11:33:15,288 [main] INFO org.apache.pig.tools.pigstats.SimplePigStats - Script Statistics:
HadoopVersion PigVersion UserId StartedAt FinishedAt Features
1.0.4 0.11.0 dliu 2013-04-13 11:32:27 2013-04-13 11:33:15 GROUP_BY,ORDER_BY
Some jobs have failed! Stop running all dependent jobs
Job Stats (time in seconds):
JobId Maps Reduces MaxMapTime MinMapTIme AvgMapTime MedianMapTime MaxReduceTime MinReduceTime AvgReduceTime MedianReducetime Alias Feature Outputs
job_local_0002 1 1 n/a n/a n/a n/a n/a n/a 1-18,logs,total,uarows MULTI_QUERY,COMBINER
job_local_0003 1 1 n/a n/a n/a n/a n/a n/a gpuarows,result GROUP_BY,COMBINER
job_local_0004 1 1 n/a n/a n/a n/a n/a n/a orderresult SAMPLER
Failed Jobs:
JobId Alias Feature Message Outputs
job_local_0005 orderresult ORDER_BY Message: Job failed! Error - NA file:/tmp/temp265162785/tmp896004388,
Input(s):
Successfully read 0 records from: "file:///home/dliu/ApacheLogAnalysisWithPig/access.log"
Output(s):
Failed to produce result in "file:/tmp/temp265162785/tmp896004388"
Counters:
Total records written : 0
Total bytes written : 0
Spillable Memory Manager spill count : 0
Total bags proactively spilled: 0
Total records proactively spilled: 0
Job DAG:
job_local_0002 -> job_local_0003,
job_local_0003 -> job_local_0004,
job_local_0004 -> job_local_0005,
job_local_0005
2013-04-13 11:33:15,291 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - Some jobs have failed! Stop running all dependent jobs
2013-04-13 11:33:15,297 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 1066: Unable to open iterator for alias orderresult
Details at logfile: /home/dliu/ApacheLogAnalysisWithPig/pig_1365823931459.log