sql - この LEFT JOIN の実行が遅いのはなぜですか?

Question

PostgreSQL クエリのLEFT JOIN一部の実行が非常に遅く、その理由がわかりません。

完全なクエリ:

SELECT t.id FROM tests t
LEFT JOIN tests c ON c.parent_id IN (t.id, t.parent_id)
INNER JOIN responses r ON (
    r.test_id IN (t.id, t.parent_id, c.id)
) WHERE r.user_id = 333

とに索引がありtests.idますtests.parent_id。

Tests には 28876 行が含まれています (そのうち 1282 行ありますWHERE parent_id IS NOT NULL)。

クエリのLEFT JOIN一部は 32098 行を生成しており、約 700ms かかります。

SELECT t.id FROM tests t
LEFT JOIN tests c ON c.parent_id IN (t.id, t.parent_id)

残りのクエリにかかる時間はごくわずかです。

なぜ遅いのか、または同じことを達成するためのより良い方法についてのアイデアはありますか?

ありがとうございました！

バージョンを選択()

PostgreSQL 9.1.9 on x86_64-unknown-linux-gnu, compiled by gcc (Ubuntu/Linaro 4.6.3-1ubuntu5) 4.6.3, 64-bit

説明する分析する

(注: これは、前の例のようにusability_tests簡略化した実際のテーブル名を使用しています。)tests

Nested Loop  (cost=5.18..158692.45 rows=80 width=4) (actual time=107.873..5718.295 rows=103 loops=1)
  Join Filter: ((r.usability_test_id = t.id) OR (r.usability_test_id = t.parent_id) OR (r.usability_test_id = c.id))
  ->  Nested Loop Left Join  (cost=0.56..136015.63 rows=28876 width=12) (actual time=0.091..486.496 rows=32098 loops=1)
        Join Filter: ((c.parent_id = t.id) OR (c.parent_id = t.parent_id))
        ->  Seq Scan on usability_tests t  (cost=0.00..1455.76 rows=28876 width=8) (actual time=0.042..39.558 rows=28876 loops=1)
        ->  Bitmap Heap Scan on usability_tests c  (cost=0.56..4.60 rows=4 width=8) (actual time=0.010..0.011 rows=0 loops=28876)
              Recheck Cond: ((parent_id = t.id) OR (parent_id = t.parent_id))
              ->  BitmapOr  (cost=0.56..0.56 rows=4 width=0) (actual time=0.008..0.008 rows=0 loops=28876)
                    ->  Bitmap Index Scan on index_usability_tests_on_parent_id  (cost=0.00..0.28 rows=2 width=0) (actual time=0.003..0.003 rows=0 loops=28876)
                          Index Cond: (parent_id = t.id)
                    ->  Bitmap Index Scan on index_usability_tests_on_parent_id  (cost=0.00..0.28 rows=2 width=0) (actual time=0.001..0.001 rows=0 loops=28876)
                          Index Cond: (parent_id = t.parent_id)
  ->  Materialize  (cost=4.62..153.63 rows=39 width=4) (actual time=0.001..0.076 rows=70 loops=32098)
        ->  Bitmap Heap Scan on responses r  (cost=4.62..153.44 rows=39 width=4) (actual time=0.053..0.187 rows=70 loops=1)
              Recheck Cond: (user_id = 3649)
              ->  Bitmap Index Scan on index_responses_on_user_id  (cost=0.00..4.61 rows=39 width=0) (actual time=0.040..0.040 rows=70 loops=1)
                    Index Cond: (user_id = 3649)
Total runtime: 5718.592 ms

score 2 · Accepted Answer

update :クエリは基本的にこのようなものです

with cte as (
    select r.test_id
    from responses as r
    where r.user_id = 333
    union all
    select c.parent_id
    from tests as c
        inner join responses as r on r.test_id = c.id
    where r.user_id = 333
)
select
    t.id
from tests as t
where
    t.id in (select c.test_id from cte as c) or
    t.parent_id in (select c.test_id from cte as c)

old : これをこのクエリに変換してみて、高速になるかどうかを確認してください。

select t.id 
from tests t
    inner join tests c on c.parent_id = t.id

union all

select t.id 
from tests t
    inner join tests c oN c.parent_id = t.parent_id

これらのクエリの 1 つを実行するのにどのくらい時間がかかりますか?

score 1 · Accepted Answer

クエリは次のように縮小できると思います。

SELECT t.id FROM tests t
WHERE EXISTS ( 
        SELECT * FROM responses r
        WHERE (r.test_id = t.id OR r.test_id = t.parent_id )
        AND r.user_id = 333
        )
OR EXISTS (
        SELECT * FROM responses r 
        JOIN tests c ON r.test_id = c.id
            -- Note: the ... OR sibling makes no sense to me
        WHERE (c.parent_id = t.id OR c.parent_id = t.parent_id)
        AND r.user_id = 333
        );

注: 質問のクエリは、の重複する値を生成する可能性がt.idあります。これは個別の値のみを報告します。

更新：（合成データで）テストしたところ、上記のクエリは、元の結果から重複を除いたものとまったく同じ結果を返します。

UPDATE2: 兄弟マッチを追加しました。

sql - この LEFT JOIN の実行が遅いのはなぜですか?

2 に答える 2

Related

Reference