mysql - ORC "SNAPPY" 形式で 2 つのテーブルを結合すると、Hive エラー "not a SequenceFile" が発生する

Question

外部結合を実行すると、「SequenceFile エラーではありません」というメッセージが表示されました。以前は同じ設定と同様のテーブルで動作していましたが、何が変わったのかわかりません。そのため、大きなキースペースでかなり大きなテーブルを結合すると、このエラーが発生します。

YARN を使用して Hive 0.13.1 Cloudera 5.3.0 を実行しています。どちらのテーブルも orc tblproperties ("orc.compress" = "SNAPPY") として保存されます。

保管情報:

SerDe Library:  org.apache.hadoop.hive.ql.io.orc.OrcSerde
InputFormat:  org.apache.hadoop.hive.ql.io.orc.OrcInputFormat
OutputFormat: org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat
Compressed:  No

このタスクの診断メッセージ:

java.lang.RuntimeException:
org.apache.hadoop.hive.ql.metadata.HiveException: java.io.IOException:
hdfs://my_cluster:9000/user/hive/warehouse/my_table/000000_0 not a
SequenceFile at
org.apache.hadoop.hive.ql.exec.mr.ExecReducer.reduce(ExecReducer.java:283)
at org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:506)
at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:447)
at org.apache.hadoop.mapred.Child$4.run(Child.java:268)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs
(UserGroupInformation.java:1642) at
org.apache.hadoop.mapred.Child.main(Child.java:262)
Caused by: org.apache.hadoop.hive.ql.metadata.HiveException:
java.io.IOException: hdfs://my_cluster:9000/user/hive/warehouse/my_table
/000000_0 not a SequenceFile at
org.apache.hadoop.hive.ql.exec.persistence.RowContainer.first
(RowContainer.java:237) at org.apache.hadoop.hive.
FAILED: Execution Error, return code 2 from
org.apache.hadoop.hive.ql.exec.mr.MapRedTask
MapReduce Jobs Launched:
Stage-Stage-1: Map: 35  Reduce: 1   Cumulative CPU: 2742.67 sec   HDFS
Read: 8762733372 HDFS Write: 0 FAIL
Total MapReduce CPU Time Spent: 45 minutes 42 seconds 670 msec

私の.hivercで

set hive.exec.dynamic.partition=true;
set hive.exec.dynamic.partition.mode=nonstrict;
set hive.exec.max.dynamic.partitions=10000;
set hive.exec.max.dynamic.partitions.pernode=10000;
set hive.exec.max.created.files=150000;
set hive.error.on.empty.partition=true;
set hive.cli.print.header=true;
set hive.optimize.s3.query=true;
set hive.auto.convert.join=true;
set mapred.child.java.opts=-Xmx2048m;
set hive.error.on.empty.partition=false;
set hive.hadoop.supports.splittable.combineinputformat=true;
set hive.enforce.bucketing=true;
set hive.optimize.bucketmapjoin=true;
set hive.mapjoin.smalltable.filesize=50000000;
set hive.resultset.use.unique.column.names=false;
set hive.cbo.enable=true;
set hive.compute.query.using.stats=true;
set hive.stats.fetch.column.stats=true;
set hive.stats.fetch.partition.stats=true;
set hive.vectorized.execution.enabled = true;
set hive.vectorized.execution.reduce.enabled = true;

両方のテーブルをシーケンスファイルとして宣言して実験しましたが、フルサイズのテーブルでは別のエラーが発生しましたが、小さなサンプルでは発生しませんでした: IndexOutOfBound.

メタストアは MySQL です。

Hive / Hadoop 設定の完全なリストを投稿するには長すぎますが、調べてみます。何を探すべきかわかりません。

それが IO または破損した HDFS に関連するものである場合、HDFS の正常性を確認するにはどうすればよいですか?

mysql - ORC "SNAPPY" 形式で 2 つのテーブルを結合すると、Hive エラー "not a SequenceFile" が発生する

0 に答える 0

Related

Reference