0

私はこの短いpythonスクリプトを持っています:

import langid
import sys

for pig_tuple in sys.stdin:
    cols = pig_tuple.split()

    if len(cols) < 2:
        sys.exit(0)

    try:
        id = int(cols[0])
        text = " ".join(cols[1:])
    except:
        sys.exit(0)

    (lang,prob) = langid.classify(text)
    print "%s\t%s" %(id,lang)

sys.exit(0)

豚のスクリプト内で実行したいと思います。私は試した:

define langid_cmd `python2.6 /data/test/compiled_python/langid_command_line.py` ship('/data/test/compiled_python/langid_command_line.py');

text = LOAD '$PIG_INPUT' USING PigStorage() as (text:chararray);

pythonDetect1 = STREAM text through langid_cmd AS (pid:chararray,planguage:chararray);

しかし、私は得る:

2013-03-29 15:53:22,290 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - Failed!
2013-03-29 15:53:22,303 [main] ERROR org.apache.pig.tools.grunt.GruntParser - ERROR 2999: Unexpected internal error. java.lang.String cannot be cast to org.apache.pig.data.Tuple
Details at logfile: /home/isl/ryan/src/main/pigScripts/pig_1364597410350.log
2013-03-29 15:53:22,306 [main] ERROR org.apache.pig.tools.grunt.GruntParser - ERROR 2999: Unexpected internal error. java.lang.String cannot be cast to org.apache.pig.data.Tuple
Details at logfile: /home/isl/ryan/src/main/pigScripts//src/main/pigScripts/pig_1364597410350.log
2013-03-29 15:53:22,308 [main] ERROR org.apache.pig.tools.grunt.GruntParser - ERROR 2999: Unexpected internal error. java.lang.String cannot be cast to org.apache.pig.data.Tuple
Details at logfile: /home/isl/ryan/src/main/pigScripts//src/main/pigScripts/pig_1364597410350.log
2013-03-29 15:53:22,311 [main] ERROR org.apache.pig.tools.grunt.GruntParser - ERROR 2999: Unexpected internal error. java.lang.String cannot be cast to org.apache.pig.data.Tuple
Details at logfile: /home/isl/ryan/src/main/pigScripts/src/main/pigScripts/pig_1364597410350.log
2013-03-29 15:53:22,313 [main] ERROR org.apache.pig.tools.grunt.GruntParser - ERROR 2999: Unexpected internal error. java.lang.String cannot be cast to org.apache.pig.data.Tuple
Details at logfile: /home/isl/ryan/src/main/pigScripts/src/main/pigScripts/pig_1364597410350.log

ディレクトリ /data/test/compiled_python は 777 に chmod されており、これをシェルから実行すると:

-bash-3.2$ echo 14353 I can haz pigscriptz? | python /data/test/compiled_python/langid_command_line.py 
14353   eu

??

4

1 に答える 1

0

AS (pid:chararray,planguage:chararray)、文字列のタプルである出力を期待するように pig に指示しますが、タブ区切りの文字列を返します。結果を次のように印刷して返す必要があります

print "(%s,%s)" %(id,lang)

または、Python UDF 統合を使用します

于 2013-03-30T14:41:05.623 に答える