sql - PostgreSQLクエリで個別のタプルを並べ替える方法

Question

個別のタプルのみを返すクエリをPostgresで送信しようとしています。私のサンプルクエリでは、cluster_id/feed_idの組み合わせに対してエントリが複数回存在する重複エントリは必要ありません。私が簡単に行う場合：

select distinct on (cluster_info.cluster_id, feed_id) 
   cluster_info.cluster_id, num_docs, feed_id, url_time 
   from url_info 
   join cluster_info on (cluster_info.cluster_id = url_info.cluster_id) 
   where feed_id in (select pot_seeder from potentials) 
   and num_docs > 5 and url_time > '2012-04-16';

それだけですが、に従ってグループ化したいと思いnum_docsます。だから、私が次のことをするとき：

select distinct on (cluster_info.cluster_id, feed_id) 
   cluster_info.cluster_id, num_docs, feed_id, url_time 
   from url_info join cluster_info 
   on (cluster_info.cluster_id = url_info.cluster_id) 
   where feed_id in (select pot_seeder from potentials) 
   and num_docs > 5 and url_time > '2012-04-16' 
   order by num_docs desc;

次のエラーが発生します。

ERROR:  SELECT DISTINCT ON expressions must match initial ORDER BY expressions
LINE 1: select distinct on (cluster_info.cluster_id, feed_id) cluste...

エラーが発生する理由は理解できたと思いますが（グループを明示的に説明しない限り、タプルでグループ化できません）、どうすればよいですか？または、エラーの解釈が間違っている場合、最初の目標を達成する方法はありますか？

score 11 · Accepted Answer

左端ORDER BYの項目は、DISTINCT節の項目と矛盾することはできません。についてのマニュアルDISTINCTを引用します：

DISTINCT ON式は、左端の式と一致する必要がありますORDER BY 。DISTINCT ON通常、ORDER BY 句には、各グループ内の行の優先順位を決定する追加の式が含まれます。

試す：

SELECT *
FROM  (
    SELECT DISTINCT ON (c.cluster_id, feed_id) 
           c.cluster_id, num_docs, feed_id, url_time 
    FROM   url_info u
    JOIN   cluster_info c ON (c.cluster_id = u.cluster_id) 
    WHERE  feed_id IN (SELECT pot_seeder FROM potentials) 
    AND    num_docs > 5
    AND    url_time > '2012-04-16'
    ORDER  BY c.cluster_id, feed_id, num_docs, url_time
           -- first columns match DISTINCT
           -- the rest to pick certain values for dupes
           -- or did you want to pick random values for dupes?
    ) x
ORDER  BY num_docs DESC;

または使用GROUP BY：

SELECT c.cluster_id
     , num_docs
     , feed_id
     , url_time 
FROM   url_info u
JOIN   cluster_info c ON (c.cluster_id = u.cluster_id) 
WHERE  feed_id IN (SELECT pot_seeder FROM potentials) 
AND    num_docs > 5
AND    url_time > '2012-04-16'
GROUP  BY c.cluster_id, feed_id 
ORDER  BY num_docs DESC;

c.cluster_id, feed_id リストに列を含めるすべて (この場合は両方) のテーブルの主キー列である場合、これSELECTは PostgreSQL 9.1以降で機能します。

それ以外の場合はGROUP BY、残りの列を集計するか、詳細情報を提供する必要があります。

sql - PostgreSQLクエリで個別のタプルを並べ替える方法

1 に答える 1

Related

Reference