sql - 多くのテーブルからの集計データ

Question

DBに3つのテーブルがあり、同じオリジンからの異なるスライスのデータが含まれているとします。すべてのテーブルの構造は非常に似ています。

id | parent_id | timestamp | contents

各テーブルには、parent_id（1つの親と多くのレコードの関係）とタイムスタンプインデックスがあります。

時間ベースでソートされたこのデータにアクセスする必要があります。現在、次のクエリを使用しています。

prepare query3(bigint) as 
select id, timestamp, contents, filter from 
 (select t1.id, t1.timestamp, t1.contents, 'filter1' as filter from table1 t1 
where t1.parent_id = $1
  union select t2.id, t2.timestamp, t2.contents, 'filter2' as filter from table2 t2 
where t2.parent_id = $1
  union select t3.id, t3.timestamp, t3.contents, 'filter3' as filter from table3 t3 
where t3.parent_id = $1 
) table_alias order by timestamp;

各テーブルにはかなりの量のデータがあるため、このクエリを実行するたびに2〜3分かかります。説明によると：650000行とSort Method: external merge Disk: 186592kB。

スキーマを変更せずに、より効果的なクエリを作成したり、特定のインデックスを作成したりせずに、取得の実行時間を最適化する方法はありますか？

ここに追加された完全な説明分析結果を更新します。この場合、クエリには4つのテーブルがありますが、この場合、3と4の間に大きな違いはないと思います。

"Sort  (cost=83569.28..83959.92 rows=156258 width=80) (actual time=2288.871..2442.318 rows=639225 loops=1)"
"  Sort Key: t1.timestamp"
"  Sort Method: external merge  Disk: 186592kB"
"  ->  Unique  (cost=52685.43..54638.65 rows=156258 width=154) (actual time=1572.274..1885.966 rows=639225 loops=1)"
"    ->  Sort  (cost=52685.43..53076.07 rows=156258 width=154) (actual time=1572.273..1737.041 rows=639225 loops=1)"
"    Sort Key: t1.id, t1.timestamp, t1.contents, ('table1'::text)"
"    Sort Method: external merge  Disk: 186624kB"
"      ->  Append  (cost=0.00..14635.39 rows=156258 width=154) (actual time=0.070..447.375 rows=639225 loops=1)"
"        ->  Index Scan using table1_parent_id on table1 t1  (cost=0.00..285.08 rows=5668 width=109) (actual time=0.068..5.993 rows=9385 loops=1)"
"        Index Cond: (parent_id = $1)"
"        ->  Index Scan using table2_parent_id on table2 t2  (cost=0.00..11249.13 rows=132927 width=168) (actual time=0.063..306.567 rows=589056 loops=1)"
"        Index Cond: (parent_id = $1)"
"        ->  Index Scan using table3_parent_id on table3 t3  (cost=0.00..957.18 rows=4693 width=40) (actual time=25.234..82.381 rows=20176 loops=1)"
"        Index Cond: (parent_id = $1)"
"        ->  Index Scan using table4_parent_id_idx on table4 t4  (cost=0.00..581.42 rows=12970 width=76) (actual time=0.029..5.894 rows=20608 loops=1)"
"        Index Cond: (parent_id = $1)"
"Total runtime: 2489.569 ms"

score 1 · Accepted Answer

あなたの時間の大部分は、組合の重複を排除することによって引き起こされます。代わりにunionallを使用してください。

select id, timestamp, contents, filter
from  ((select t1.id, t1.timestamp, t1.contents, 'filter1' as filter
        from table1 t1 
        where t1.parent_id = $1
       )
       union all
       (select t2.id, t2.timestamp, t2.contents, 'filter2' as filter
        from table2 t2 
        where t2.parent_id = $1
       )
       union all
       (select t3.id, t3.timestamp, t3.contents, 'filter3' as filter
        from table3 t3 
        where t3.parent_id = $1 
       )
      ) table_alias
order by timestamp;

これをより効果的にするには、3つのテーブルのそれぞれにparent_idのインデックスを付ける必要があります。これらの変更により、非常に高速に実行されるはずです。

sql - 多くのテーブルからの集計データ

1 に答える 1

Related

Reference