0

Amazon DynamoDB を使用して統計を収集し、Hive で ElasticMapReduce を使用して統計を処理し、結果を S3 にアップロードしています。

DynamoDB にはテーブル prod_product_views があります: - id (ハッシュ キー) - product_id (範囲キー) - company_id - creted - price - views_by_company_id - views_by_user_id

今のところ、このテーブルには約 7000 のレコードがあります。

問題は hiveql の実行が遅くなることです。

たとえば、DynamoDB に保存された外部テーブルを作成する最初のステップがあります。

CREATE EXTERNAL TABLE prod_product_views (id string, product_id bigint, company_id bigint, created bigint, price string, viewed_by_company_id bigint, viewed_by_user_id bigint)
STORED BY 'org.apache.hadoop.hive.dynamodb.DynamoDBStorageHandler'
TBLPROPERTIES ("dynamodb.table.name" = "prod_product_views",
"dynamodb.column.mapping" = "id:id,product_id:product_id,company_id:company_id,created:created,price:price,viewed_by_company_id:viewed_by_company_id,viewed_by_user_id:viewed_by_user_id"); 

この手順は問題ありません (所要時間: 12.908 秒)

2 番目のステップは、最終日のビューを取得することです。

SELECT * from prod_product_views
WHERE 
created > UNIX_TIMESTAMP(CONCAT(DATE_SUB(FROM_UNIXTIME(UNIX_TIMESTAMP()), 1)," ","00:00:00")) 
and created < UNIX_TIMESTAMP(CONCAT(DATE_SUB(FROM_UNIXTIME(UNIX_TIMESTAMP()), 1)," ","23:59:59")); 

このステップには長い時間 (約 60 分) かかる場合があります。

これは出力の一部です:

2013-05-23 08:23:06,097 Stage-1 map = 0%,  reduce = 0%, Cumulative CPU 3.39 sec
2013-05-23 08:23:07,103 Stage-1 map = 0%,  reduce = 0%, Cumulative CPU 3.39 sec
2013-05-23 08:23:08,109 Stage-1 map = 0%,  reduce = 0%, Cumulative CPU 3.39 sec
2013-05-23 08:23:09,115 Stage-1 map = 0%,  reduce = 0%, Cumulative CPU 3.39 sec
2013-05-23 08:23:10,121 Stage-1 map = 0%,  reduce = 0%, Cumulative CPU 3.39 sec
2013-05-23 08:23:11,147 Stage-1 map = 0%,  reduce = 0%, Cumulative CPU 3.39 sec
2013-05-23 08:23:12,153 Stage-1 map = 0%,  reduce = 0%, Cumulative CPU 3.39 sec
2013-05-23 08:23:13,160 Stage-1 map = 0%,  reduce = 0%, Cumulative CPU 3.39 sec
2013-05-23 08:23:14,169 Stage-1 map = 0%,  reduce = 0%, Cumulative CPU 3.39 sec
2013-05-23 08:23:15,177 Stage-1 map = 0%,  reduce = 0%, Cumulative CPU 3.39 sec
2013-05-23 08:23:16,183 Stage-1 map = 0%,  reduce = 0%, Cumulative CPU 3.39 sec
2013-05-23 08:23:17,193 Stage-1 map = 0%,  reduce = 0%, Cumulative CPU 3.39 sec
2013-05-23 08:23:18,219 Stage-1 map = 0%,  reduce = 0%, Cumulative CPU 3.39 sec
2013-05-23 08:23:19,225 Stage-1 map = 0%,  reduce = 0%, Cumulative CPU 3.39 sec
2013-05-23 08:23:20,234 Stage-1 map = 0%,  reduce = 0%, Cumulative CPU 3.39 sec
2013-05-23 08:23:21,240 Stage-1 map = 0%,  reduce = 0%, Cumulative CPU 3.39 sec
2013-05-23 08:23:22,247 Stage-1 map = 0%,  reduce = 0%, Cumulative CPU 3.39 sec
2013-05-23 08:23:23,253 Stage-1 map = 0%,  reduce = 0%, Cumulative CPU 3.39 sec
2013-05-23 08:23:24,259 Stage-1 map = 0%,  reduce = 0%, Cumulative CPU 3.39 sec
2013-05-23 08:23:25,265 Stage-1 map = 0%,  reduce = 0%, Cumulative CPU 3.39 sec
2013-05-23 08:23:26,273 Stage-1 map = 0%,  reduce = 0%, Cumulative CPU 3.39 sec
2013-05-23 08:23:27,279 Stage-1 map = 0%,  reduce = 0%, Cumulative CPU 3.39 sec
2013-05-23 08:23:28,290 Stage-1 map = 0%,  reduce = 0%, Cumulative CPU 3.39 sec
2013-05-23 08:23:29,312 Stage-1 map = 0%,  reduce = 0%, Cumulative CPU 3.39 sec
2013-05-23 08:23:30,318 Stage-1 map = 0%,  reduce = 0%, Cumulative CPU 3.39 sec
2013-05-23 08:23:31,324 Stage-1 map = 0%,  reduce = 0%, Cumulative CPU 3.39 sec
2013-05-23 08:23:32,333 Stage-1 map = 0%,  reduce = 0%, Cumulative CPU 3.39 sec
2013-05-23 08:23:33,358 Stage-1 map = 0%,  reduce = 0%, Cumulative CPU 3.39 sec
2013-05-23 08:23:34,364 Stage-1 map = 0%,  reduce = 0%, Cumulative CPU 3.39 sec
2013-05-23 08:23:35,394 Stage-1 map = 0%,  reduce = 0%, Cumulative CPU 3.39 sec
2013-05-23 08:23:36,400 Stage-1 map = 0%,  reduce = 0%, Cumulative CPU 3.39 sec
2013-05-23 08:23:37,408 Stage-1 map = 0%,  reduce = 0%, Cumulative CPU 3.39 sec
2013-05-23 08:23:38,418 Stage-1 map = 0%,  reduce = 0%, Cumulative CPU 3.39 sec
2013-05-23 08:23:39,478 Stage-1 map = 0%,  reduce = 0%, Cumulative CPU 3.39 sec
2013-05-23 08:23:40,538 Stage-1 map = 0%,  reduce = 0%, Cumulative CPU 3.39 sec
2013-05-23 08:23:41,544 Stage-1 map = 0%,  reduce = 0%, Cumulative CPU 3.39 sec
2013-05-23 08:23:42,550 Stage-1 map = 0%,  reduce = 0%, Cumulative CPU 3.39 sec
2013-05-23 08:23:43,557 Stage-1 map = 0%,  reduce = 0%, Cumulative CPU 3.39 sec
2013-05-23 08:23:44,563 Stage-1 map = 0%,  reduce = 0%, Cumulative CPU 3.39 sec
2013-05-23 08:23:45,569 Stage-1 map = 0%,  reduce = 0%, Cumulative CPU 3.39 sec
2013-05-23 08:23:46,579 Stage-1 map = 0%,  reduce = 0%, Cumulative CPU 3.39 sec
2013-05-23 08:23:47,607 Stage-1 map = 0%,  reduce = 0%, Cumulative CPU 3.76 sec
2013-05-23 08:23:48,613 Stage-1 map = 0%,  reduce = 0%, Cumulative CPU 3.76 sec
2013-05-23 08:23:49,623 Stage-1 map = 0%,  reduce = 0%, Cumulative CPU 3.76 sec
2013-05-23 08:23:50,633 Stage-1 map = 0%,  reduce = 0%, Cumulative CPU 3.76 sec
2013-05-23 08:23:51,638 Stage-1 map = 0%,  reduce = 0%, Cumulative CPU 3.76 sec
2013-05-23 08:23:52,650 Stage-1 map = 0%,  reduce = 0%, Cumulative CPU 3.76 sec
2013-05-23 08:23:53,657 Stage-1 map = 0%,  reduce = 0%, Cumulative CPU 3.76 sec
2013-05-23 08:23:54,665 Stage-1 map = 0%,  reduce = 0%, Cumulative CPU 3.76 sec
2013-05-23 08:23:55,691 Stage-1 map = 0%,  reduce = 0%, Cumulative CPU 3.76 sec
2013-05-23 08:23:56,697 Stage-1 map = 0%,  reduce = 0%, Cumulative CPU 3.76 sec

私はこの種のサービスに慣れていないのですが、何か間違ったことをしていますか、それとも設定にトリックやこれをスピードアップする何かがありますか? これは単純なクエリのように見え、7000 レコードは大量のデータではないためです。

前もって感謝します!

4

1 に答える 1