sql - 大きな子テーブルの日付クエリを最適化します：GiSTまたはGIN？

Question

問題

それぞれが年インデックスとステーションインデックスを持つ72の子テーブルは、次のように定義されます。

CREATE TABLE climate.measurement_12_013
(
-- Inherited from table climate.measurement_12_013:  id bigint NOT NULL DEFAULT nextval('climate.measurement_id_seq'::regclass),
-- Inherited from table climate.measurement_12_013:  station_id integer NOT NULL,
-- Inherited from table climate.measurement_12_013:  taken date NOT NULL,
-- Inherited from table climate.measurement_12_013:  amount numeric(8,2) NOT NULL,
-- Inherited from table climate.measurement_12_013:  category_id smallint NOT NULL,
-- Inherited from table climate.measurement_12_013:  flag character varying(1) NOT NULL DEFAULT ' '::character varying,
  CONSTRAINT measurement_12_013_category_id_check CHECK (category_id = 7),
  CONSTRAINT measurement_12_013_taken_check CHECK (date_part('month'::text, taken)::integer = 12)
)
INHERITS (climate.measurement)

CREATE INDEX measurement_12_013_s_idx
  ON climate.measurement_12_013
  USING btree
  (station_id);
CREATE INDEX measurement_12_013_y_idx
  ON climate.measurement_12_013
  USING btree
  (date_part('year'::text, taken));

（外部キー制約は後で追加されます。）

次のクエリは、全表スキャンのために実行速度が非常に遅くなります。

SELECT
  count(1) AS measurements,
  avg(m.amount) AS amount
FROM
  climate.measurement m
WHERE
  m.station_id IN (
    SELECT
      s.id
    FROM
      climate.station s,
      climate.city c
    WHERE
        /* For one city... */
        c.id = 5182 AND

        /* Where stations are within an elevation range... */
        s.elevation BETWEEN 0 AND 3000 AND

        /* and within a specific radius... */
        6371.009 * SQRT( 
          POW(RADIANS(c.latitude_decimal - s.latitude_decimal), 2) +
            (COS(RADIANS(c.latitude_decimal + s.latitude_decimal) / 2) *
              POW(RADIANS(c.longitude_decimal - s.longitude_decimal), 2))
        ) <= 50
    ) AND

  /* Data before 1900 is shaky; insufficient after 2009. */
  extract( YEAR FROM m.taken ) BETWEEN 1900 AND 2009 AND

  /* Whittled down by category... */
  m.category_id = 1 AND

  /* Between the selected days and years... */
  m.taken BETWEEN
   /* Start date. */
   (extract( YEAR FROM m.taken )||'-01-01')::date AND
    /* End date. Calculated by checking to see if the end date wraps
       into the next year. If it does, then add 1 to the current year.
    */
    (cast(extract( YEAR FROM m.taken ) + greatest( -1 *
      sign(
        (extract( YEAR FROM m.taken )||'-12-31')::date -
        (extract( YEAR FROM m.taken )||'-01-01')::date ), 0
    ) AS text)||'-12-31')::date
GROUP BY
  extract( YEAR FROM m.taken )

鈍さは、クエリのこの部分に起因します。

  m.taken BETWEEN
    /* Start date. */
  (extract( YEAR FROM m.taken )||'-01-01')::date AND
    /* End date. Calculated by checking to see if the end date wraps
      into the next year. If it does, then add 1 to the current year.
    */
    (cast(extract( YEAR FROM m.taken ) + greatest( -1 *
      sign(
        (extract( YEAR FROM m.taken )||'-12-31')::date -
        (extract( YEAR FROM m.taken )||'-01-01')::date ), 0
    ) AS text)||'-12-31')::date

クエリのこの部分は、選択した日の一致します。たとえば、ユーザーが6月1日から7月1日までのデータがあるすべての年のデータを確認したい場合、上記の句はそれらの日と一致します。使用者が12月22日から3月22日までのデータを確認する場合、データがあるすべての年について、上記の句は3月22日が12月22日の翌年であると計算するため、それに応じて日付と一致します。

現在、日付は1月1日から12月31日までに固定されていますが、上記のようにパラメータ化されます。

計画からのHashAggregateは、10006220141.11のコストを示しています。これは、天文学的に巨大な側面であると私は思います。

実行中の測定テーブル（データもインデックスもありません）で全表スキャンが実行されます。このテーブルは、子テーブルから2億7300万行を集約します。

質問

全表スキャンを回避するために日付にインデックスを付ける適切な方法は何ですか？

私が検討したオプション：

ジン
要旨
WHERE句を書き直します
year_taken、month_taken、およびday_taken列をテーブルに分けます

あなたの考えは何ですか？

ありがとうございました！

score 2 · Accepted Answer

あなたの問題は、日付の計算に応じてwhere句があることです。日付が一致するかどうかを知る前に、データベースがすべての行をフェッチして計算を行う必要がある場合、データベースがインデックスを使用する方法はありません。

データベースが取得するデータに依存しないチェック範囲が固定されている形式に書き直さない限り、常にテーブルをスキャンする必要があります。

score 1 · Accepted Answer

次のようなものを試してください。

create temporary table test (d date);

insert into test select '1970-01-01'::date+generate_series(1,50*365);

analyze test

create function month_day(d date) returns int as $$
  select extract(month from $1)::int*100+extract(day from $1)::int $$
language sql immutable strict;

create index test_d_month_day_idx on test (month_day(d));

explain analyze select * from test
  where month_day(d)>=month_day('2000-04-01')
  and month_day(d)<=month_day('2000-04-05');

score 0 · Accepted Answer

これらのパーティション全体でこれを効率的に実行すると、日付範囲についてアプリをかなり賢くすることができます。パーティションごとにチェックする日付の実際のリストを生成してから、パーティション間のUNIONを使用して1つのクエリを生成するようにします。データセットはかなり静的であるように思われるため、日付インデックスのCLUSTERもパフォーマンスを大幅に向上させる可能性があります。

sql - 大きな子テーブルの日付クエリを最適化します：GiSTまたはGIN？

3 に答える 3

Related

Reference