sql - Group By 内の複数の列に対して「最も一般的な値」を返す SQL 関数

Question

グループ化された select ステートメントの複数の列の結果で最も一般的な値を返す最も簡単な方法を探しています。私がオンラインで見つけているものはすべて、select 内の単一の項目の RANK を指すか、GROUP BY の外で各列を個別に処理します。

サンプルデータ：

SELECT 100 as "auser", 
'A' as "instance1", 'M' as "instance2" 
union all select 100, 'B', 'M' 
union all select 100,'C', 'N' 
union all select 100, 'B', 'O'
union all select 200,'D', 'P' 
union all select 200, 'E', 'P' 
union all select 200,'F', 'P' 
union all select 200, 'F', 'Q'

サンプルデータの結果:

auser   instance1   instance2
100     A           M
100     B           M
100     C           N
100     B           O
200     D           P
200     E           P
200     F           P
200     F           Q

クエリロジック (私の頭の中での見方):

SELECT auser, most_common(instance1), most_common(instance2)
FROM datasample
GROUP BY auser;

望ましい結果:

100     B           M
200     F           P

score 4 · Accepted Answer

この問題を解決するこのアプローチでは、入れ子になったウィンドウ関数を使用します。最も内側のサブクエリは、各列の数を計算します。次のサブクエリは、これらをランク付けします ( を使用row_number())。次に、外側のクエリは条件付き集計を使用して、必要な結果を取得します。

select auser, MAX(case when seqnum1 = 1 then instance1 end),
       MAX(case when seqnum2 = 1 then instance2 end)
from (select t.*,
             ROW_NUMBER() over (partition by auser order by cnt1 desc) as seqnum1,
             ROW_NUMBER() over (partition by auser order by cnt2 desc) as seqnum2
      from (select t.*,
                   count(*) over (partition by auser, instance1) as cnt1,
                   COUNT(*) over (partition by auser, instance2) as cnt2
            from t
           ) t
     ) t
group by auser

sql - Group By 内の複数の列に対して「最も一般的な値」を返す SQL 関数

3 に答える 3

Related

Reference