1
create external table if not exists my_table
(customer_id STRING,ip_id STRING)
location 'ip_b_class';

その後:

hive> set mapred.reduce.tasks=50;
hive> select count(distinct customer_id) from my_table;
Total MapReduce jobs = 1
Launching Job 1 out of 1
Number of reduce tasks determined at compile time: 1

そこに160GBあり、1つのreducerでは長い時間がかかります...

[ihadanny@lvshdc2en0011 ~]$ hdu 
Found 8 items
162808042208   hdfs://horton/ip_b_class

...

4

2 に答える 2