distinct - ハイブ数 (個別) 1 つのリデューサーのみ

Question

それは長い間私を悩ませています。以下のように、複数の個別のハイブ集約を使用するのが好きです。

select count(case when pv=1 then 1 else null end), count(distinct case when pv=1 and uid>1 then uid else null end), count(distinct component), count(message) from log_table;

これにより、ジョブが非常に長い時間になるレデューサーが1つだけになります。ほとんどの作業はレデューサーの仕事なので。

いくつかの異なる列があるため、以下のようなサブクエリを使用するのは適していません。

 select count(1) from (select v from tbl group by v) t.

この問題を最適化するための良いアイデア。それをいくつかのクエリに分割してから結合するだけですか?

ありがとうございました！

score 0 · Accepted Answer

次のようなことを試しましたか：

select
 sum(a.pv_cnt),
 count(a.uniq_uids),
 count(a.uniq_uids),
 sum(a.msg_cnt)
from
  (
   select
     case when pv=1 then 1 else 0 end  as pv_cnt,
     distinct ( case when pv=1 and uid>1 then uid ) as uniq_uids,
     component,
     count(message) msg_cnt;
   from
    log_table
  )a;

distinct - ハイブ数 (個別) 1 つのリデューサーのみ

1 に答える 1

Related

Reference