postgresql - Postgres パーティションのプルーニング

Question

Postgres に大きなテーブルがあります。

テーブル名は次のとおりbigtableで、列は次のとおりです。

integer    |timestamp   |xxx |xxx |...|xxx
category_id|capture_time|col1|col2|...|colN

category_id の modulo 10 と capture_time 列の日付部分でテーブルを分割しました。

パーティションテーブルは次のようになります。

CREATE TABLE myschema.bigtable_d000h0(
    CHECK ( category_id%10=0 AND capture_time >= DATE '2012-01-01' AND capture_time < DATE '2012-01-02')
) INHERITS (myschema.bigtable);

CREATE TABLE myschema.bigtable_d000h1(
    CHECK ( category_id%10=1 AND capture_time >= DATE '2012-01-01' AND capture_time < DATE '2012-01-02')
) INHERITS (myschema.bigtable);

where 句で category_id と capture_time を使用してクエリを実行すると、パーティションが期待どおりにプルーニングされません。

explain select * from bigtable where capture_time >= '2012-01-01' and  capture_time < '2012-01-02' and category_id=100;

"Result  (cost=0.00..9476.87 rows=1933 width=216)"
"  ->  Append  (cost=0.00..9476.87 rows=1933 width=216)"
"        ->  Seq Scan on bigtable  (cost=0.00..0.00 rows=1 width=210)"
"              Filter: ((capture_time >= '2012-01-01 00:00:00'::timestamp without time zone) AND (capture_time < '2012-01-02 00:00:00'::timestamp without time zone) AND (category_id = 100))"
"        ->  Seq Scan on bigtable_d000h0 bigtable  (cost=0.00..1921.63 rows=1923 width=216)"
"              Filter: ((capture_time >= '2012-01-01 00:00:00'::timestamp without time zone) AND (capture_time < '2012-01-02 00:00:00'::timestamp without time zone) AND (category_id = 100))"
"        ->  Seq Scan on bigtable_d000h1 bigtable  (cost=0.00..776.93 rows=1 width=218)"
"              Filter: ((capture_time >= '2012-01-01 00:00:00'::timestamp without time zone) AND (capture_time < '2012-01-02 00:00:00'::timestamp without time zone) AND (category_id = 100))"
"        ->  Seq Scan on bigtable_d000h2 bigtable  (cost=0.00..974.47 rows=1 width=216)"
"              Filter: ((capture_time >= '2012-01-01 00:00:00'::timestamp without time zone) AND (capture_time < '2012-01-02 00:00:00'::timestamp without time zone) AND (category_id = 100))"
"        ->  Seq Scan on bigtable_d000h3 bigtable  (cost=0.00..1351.92 rows=1 width=214)"
"              Filter: ((capture_time >= '2012-01-01 00:00:00'::timestamp without time zone) AND (capture_time < '2012-01-02 00:00:00'::timestamp without time zone) AND (category_id = 100))"
"        ->  Seq Scan on bigtable_d000h4 bigtable  (cost=0.00..577.04 rows=1 width=217)"
"              Filter: ((capture_time >= '2012-01-01 00:00:00'::timestamp without time zone) AND (capture_time < '2012-01-02 00:00:00'::timestamp without time zone) AND (category_id = 100))"
"        ->  Seq Scan on bigtable_d000h5 bigtable  (cost=0.00..360.67 rows=1 width=219)"
"              Filter: ((capture_time >= '2012-01-01 00:00:00'::timestamp without time zone) AND (capture_time < '2012-01-02 00:00:00'::timestamp without time zone) AND (category_id = 100))"
"        ->  Seq Scan on bigtable_d000h6 bigtable  (cost=0.00..1778.18 rows=1 width=214)"
"              Filter: ((capture_time >= '2012-01-01 00:00:00'::timestamp without time zone) AND (capture_time < '2012-01-02 00:00:00'::timestamp without time zone) AND (category_id = 100))"
"        ->  Seq Scan on bigtable_d000h7 bigtable  (cost=0.00..315.82 rows=1 width=216)"
"              Filter: ((capture_time >= '2012-01-01 00:00:00'::timestamp without time zone) AND (capture_time < '2012-01-02 00:00:00'::timestamp without time zone) AND (category_id = 100))"
"        ->  Seq Scan on bigtable_d000h8 bigtable  (cost=0.00..372.06 rows=1 width=219)"
"              Filter: ((capture_time >= '2012-01-01 00:00:00'::timestamp without time zone) AND (capture_time < '2012-01-02 00:00:00'::timestamp without time zone) AND (category_id = 100))"
"        ->  Seq Scan on bigtable_d000h9 bigtable  (cost=0.00..1048.16 rows=1 width=215)"
"              Filter: ((capture_time >= '2012-01-01 00:00:00'::timestamp without time zone) AND (capture_time < '2012-01-02 00:00:00'::timestamp without time zone) AND (category_id = 100))"

category_id%10=0ただし、 where句に正確なモジュロ基準（）を追加すると、完全に機能します

explain select * from bigtable where capture_time >= '2012-01-01' and  capture_time < '2012-01-02' and category_id=100 and category_id%10=0;

"Result  (cost=0.00..2154.09 rows=11 width=215)"
"  ->  Append  (cost=0.00..2154.09 rows=11 width=215)"
"        ->  Seq Scan on bigtable  (cost=0.00..0.00 rows=1 width=210)"
"              Filter: ((capture_time >= '2012-01-01 00:00:00'::timestamp without time zone) AND (capture_time < '2012-01-02 00:00:00'::timestamp without time zone) AND (category_id = 100) AND ((category_id % 10) = 0))"
"        ->  Seq Scan on bigtable_d000h0 bigtable  (cost=0.00..2154.09 rows=10 width=216)"
"              Filter: ((capture_time >= '2012-01-01 00:00:00'::timestamp without time zone) AND (capture_time < '2012-01-02 00:00:00'::timestamp without time zone) AND (category_id = 100) AND ((category_id % 10) = 0))"

すべてのクエリにモジュロ条件を追加することなく、パーティションのプルーニングを正しく機能させる方法はありますか?

score 4 · Accepted Answer

つまり、除外制約に対して、PostgreSQLは暗黙的な index を作成します。あなたの場合、このインデックスは部分的なものになります。「値だけでなく、列で式を使用しているからです。そして、それはドキュメントに記載されています（11-2の例を探してください）：

PostgreSQL には、異なる形式で書かれた数学的に同等の式を認識できる洗練された定理証明器がありません。(そのような一般的な定理証明を作成するのは非常に難しいだけでなく、実際に使用するには遅すぎるでしょう。) システムは単純な不等式の含意を認識することができます。たとえば、"x < 1" は "x < 2" を意味します。それ以外の場合、述語条件はクエリの WHERE 条件の一部と正確に一致する必要があります。一致しない場合、インデックスは使用可能として認識されません。照合は、実行時ではなく、クエリの計画時に行われます。

したがって、結果は、CHECK 制約を作成するときに使用したものとまったく同じ式になるはずです。

HASH ベースのパーティショニングでは、次の 2 つのアプローチを好みます。

限られた値のセット（あなたの場合は10）を取ることができるフィールドを追加します。そのようなものが設計上存在する場合に最適です。
タイムスタンプ範囲を指定するのと同じ方法でハッシュ範囲を指定します: MINVALUE <= category_id < MAXVALUE

また、2 レベルのパーティショニングを作成することもできます。

最初のレベルでは、category_id HASH に基づいて 10 個のパーティションを作成します。
第 2 レベルでは、日付範囲に基づいて必要な数のパーティションを作成します。

パーティション分割には常に 1 つの列のみを使用するようにしていますが、管理が容易です。

score 1 · Accepted Answer

同じ問題を抱えている人のために: 最も簡単な方法は、クエリを変更してモジュロ条件を含めることであるという結論に達しました。category_id%10=0

postgresql - Postgres パーティションのプルーニング

2 に答える 2

Related

Reference