sql - 大規模な Postgresql テーブルのネストされた結合ウィンドウ関数の最適化

Question

サイズが 56 GB (789700760 行) のテーブルに対して次のクエリを実行していて、実行時間のボトルネックに遭遇しました。以前のいくつかの例から、INNER JOIN を「ネスト解除」して、大規模なデータセットに対してクエリのパフォーマンスを向上させる方法があるかもしれないと考えました。特に、以下のクエリは、MPP PostgreSQL 展開で実行を完了するのに 7.651 時間かかりました。

create table large_table as
select column1, column2, column3, column4, column5, column6
from
(
  select 
    a.column1, a.column2, a.start_time,
    rank() OVER( 
      PARTITION BY a.column2, a.column1 order by a.start_time DESC 
    ) as rank,
    last_value( a.column3) OVER (
      PARTITION BY a.column2, a.column1 order by a.start_time ASC
      RANGE BETWEEN unbounded preceding and unbounded following 
    ) as column3,
    a.column4, a.column5, a.column6
  from 
    (table2 s 
      INNER JOIN table3 t 
      ON s.column2=t.column2 and s.event_time > t.start_time 
    ) a
 ) b
 where rank =1;

質問 1: 上記の SQL コードを変更して、クエリの全体的な実行時間を短縮する方法はありますか?

score 1 · Accepted Answer

last_value を外側のサブクエリに移動すると、パフォーマンスが向上する可能性があります。last_value は、開始時間が最も小さい各パーティションの column3 の値を取得しています。正確にはランク = 1 です。

select column1, column2,
       ast_value( a.column3) OVER (PARTITION BY column2, column1 order by start_time ASC
                                   RANGE BETWEEN unbounded preceding and unbounded following
                                  ) as column3,
       column4, column5, column6
from (select a.column1, a.column2, a.start_time,
             rank() OVER (PARTITION BY a.column2, a.column1 order by a.start_time DESC
                         ) as rank,
            a.column3, a.column4, a.column5, a.column6
      from (table2 s INNER JOIN
            table3 t
            ON s.column2 = t.column2 and s.event_time > t.start_time
           ) a
     ) b
where rank = 1

それ以外の場合は、実行計画と table2 および table3 に関する詳細情報を提供して、さらにヘルプを得る必要があります。

sql - 大規模な Postgresql テーブルのネストされた結合ウィンドウ関数の最適化

1 に答える 1

Related

Reference