hadoop - Hive パーティションを Apache Crunch パイプラインに読み込む方法は?

Question

hdfs のテキストファイルを apache crunch パイプラインに読み込むことができます。しかし今、ハイブパーティションを読み取る必要があります。問題は、私たちの設計によると、ファイルに直接アクセスすることは想定されていないことです。したがって、HCatalog などを使用してパーティションにアクセスできる方法が必要です。

score 0 · Accepted Answer

org.apache.hadoop.hive.metastore API または HCat API を使用できます。hive.metastore を使用した簡単な例を次に示します。マッパー/リデューサーで Hive パーティションに参加したい場合を除き、パイプラインを開始する前にまたはを呼び出す必要があります。

HiveMetaStoreClient hmsc = new HiveMetaStoreClient(hiveConf)
HiveMetaStoreClient hiveClient = getHiveMetastoreConnection();
List<Partition> partitions = hiveClient.listPartittions("default", "my_hive_table", 1000)
for(Partition partition: partitions) {
   System.out.println("HDFS data location of the partition: " + partition.getSd().getLocation())
}

他に必要なのは、hive conf dir をエクスポートすることだけです。

export HIVE_CONF_DIR=/home/mmichalski/hive/conf

hadoop - Hive パーティションを Apache Crunch パイプラインに読み込む方法は?

1 に答える 1

Related

Reference