hadoop - ハイブの「挿入上書き」動的パーティションクエリでパーティションの場所を設定します

Question

AWS S3 の場所を指すベースの場所を持つハイブテーブルを作成しました。ただし、'Insert Overwrite' クエリを使用して HDFS クラスターにパーティションを作成したいと考えています。

以下の手順:

-- Create intermediate table
create table test_int_ash
( loc string)
partitioned by (name string, age int)
ROW FORMAT DELIMITED
FIELDS TERMINATED BY ','
stored as textfile
location '/user/ash/test_int';

-- Insert into intermedate table with two names 'rash' and 'nash'
INSERT INTO test_int_ash partition (name="rash",age=20) values ('brisbane');
INSERT INTO test_int_ash partition (name="rash",age=30) values ('Sydney');
INSERT INTO test_int_ash partition (name="rash",age=40) values ('Melbourne');
INSERT INTO test_int_ash partition (name="rash",age=50) values ('Perth');

INSERT INTO test_int_ash partition (name="nash",age=50) values ('Auckland');
INSERT INTO test_int_ash partition (name="nash",age=40) values ('Wellington');


-- create curated table
create external table test_curated_ash
( loc string)
partitioned by (name string, age int)
ROW FORMAT DELIMITED
FIELDS TERMINATED BY ','
stored as textfile
location 's3a://mybucket/tmp/test_curated/'; 

-- load curated table from intermedate table, using dynamic partition method, creates partitions on aws s3.
insert overwrite table test_curated_ash partition(name='rash',age)
select loc,age from test_int_ash where name='rash' ;

-- I want to keep this partition on HDFS cluster, below query doesnt work 

insert overwrite table test_curated_ash partition(name='nash',age) location 'hdfs://mynamenode/user/ash/test_curated_new'
select loc,age from test_int_ash where name='nash';

以下のクエリは機能しますが、「静的パーティション」メソッドで処理したくありません。

alter table test_curated_ash add partition(name='nash',age=40) location 'hdfs://swmcdh1/user/contexti/ash/test_curated_new/name=nash/age=40';
alter table test_curated_ash add partition(name='nash',age=50) location 'hdfs://swmcdh1/user/contexti/ash/test_curated_new/name=nash/age=50';

insert overwrite table test_curated_ash partition(name='nash',age)
select loc,age from test_int_ash where name='nash'

「挿入上書き」動的クエリでパーティションの場所を設定する方法を教えてください。

score 0 · Accepted Answer

「user」という名前のテーブルがあり、国の列を使用して動的に分割したいとします。

クエリ:

set hive.exec.dynamic.partition=true;
set hive.exec.dynamic.partition.mode=nonstrict;
set hive.exec.max.dynamic.partitions=1000;
set hive.exec.max.dynamic.partitions.pernode=1000;

INSERT overwrite TABLE partitioned_user
    PARTITION (country)
        SELECT  firstname ,lastname,address,city,salary ,post,phone1,phone2,email,
        web,country FROM user;

パーティションにデータを挿入するときは、クエリの最後の列としてパーティション列を含める必要があります。

hive.exec.dynamic.partition.mode=nonstrict を設定します。厳しい場合

mapreduce の厳密モード (hive.mapred.mode=strict) では、一部の危険なクエリの実行が許可されていません。それらには以下が含まれます：

デカルト積。
クエリに対してパーティションが選択されていません。
bigint と文字列の比較。
bigint と double の比較。
オーダーバイ無制限。

ポイント 2 と 5 によると、少なくとも 1 つのパーティションキーフィルター (WHERE country='US' など) を使用せずに SELECT ステートメントを使用したり、パーティションテーブルで LIMIT 条件を使用せずに ORDER BY 句を使用したりすることはできません。ただし、デフォルトでは、このプロパティは非厳密に設定されています。

score 0 · Accepted Answer

別の中間テーブルを使用して、HDFS 上のパーティションでデータを作成できます。

次に、Final テーブルのパーティションの場所を変更して、次のようにして別の場所を指すようにします -

use dbname;ALTER TABLE table_name PARTITION (partname=value) SET LOCATION "location";

または、適切な SD_ID の Hive メタストアテーブル SDS を直接更新できます。

hadoop - ハイブの「挿入上書き」動的パーティションクエリでパーティションの場所を設定します

2 に答える 2

Related

Reference