performance - PostgreSQL: 存在 vs 左結合

Question

postgres が存在するクエリをさらに高速に処理し、 join を残したということを何度も耳にしました。 http://archives.postgresql.org/pgsql-performance/2002-12/msg00185.php

これは、1 つのテーブル集計については間違いなく当てはまります。

しかし、私たちの場合、それらは複数あり、同じクエリビルドが存在し、postgres を永久にハングさせます。

explain 
SELECT count(DISTINCT "groups".id) AS count_all 
FROM "groups"
WHERE (exists(
    select * from products p where groups.id = p.group_id AND exists(
        select * from products_categories pc where p.id = pc.product_id AND pc.category_id in (2,3))) AND groups.id != 3)

結果：

 Aggregate  (cost=26413436.66..26413436.67 rows=1 width=4)
   ->  Seq Scan on groups  (cost=0.00..26413403.84 rows=13126 width=4)
         Filter: ((id <> 3) AND (subplan))
         SubPlan
           ->  Index Scan using index_products_on_group_id on products p  (cost=0.00..1006.13 rows=1 width=1483)
                 Index Cond: ($1 = group_id)
                 Filter: (subplan)
                 SubPlan
                   ->  Seq Scan on products_categories pc  (cost=0.00..498.49 rows=1 width=8)
                         Filter: ((category_id = ANY ('{2,3}'::integer[])) AND ($0 = product_id))

それが信じられないほど長い実行時間の根本的な原因ですか? それはある種の構成上の問題ですか？

ありがとう、ボグダン。

score 3 · Accepted Answer

「グループ」の各行について、postgresqlはproducts_categoriesのフルスキャンを実行していますが、これは適切ではありません。必ずしも構成の問題ではありませんが、おそらくそのようなサブクエリをネストせずにクエリを記述できますか？

SELECT count(DISTINCT "groups".id) AS count_all 
FROM "groups"
WHERE exists(
    select 1 from products p where groups.id = p.group_id
             join products_categories pc on pc.product_id = p.id
    where pc.category_id in (2,3)
    ) and groups.id <> 3

またproducts_categories、インデックスがありproduct_idますか？

performance - PostgreSQL: 存在 vs 左結合

1 に答える 1

Related

Reference