postgresql - postgresql 9.4 で GIN インデックススキャンを強制する

Question

私は場所のテーブルを持っています（約2900万行）

Table "public.locations"
Column   |   Type| Modifiers  
------------------------------------+-------------------+------------------------------------------------------------
id | integer   | not null default nextval('locations_id_seq'::regclass)
dl | text  | 
Indexes:
"locations_pkey" PRIMARY KEY, btree (id)
"locations_test_idx" gin (to_tsvector('english'::regconfig, dl))

次のクエリがうまく機能することを望みます。

EXPLAIN (ANALYZE,BUFFERS) SELECT id  FROM locations WHERE  to_tsvector('english'::regconfig, dl)  @@ to_tsquery('Lymps') LIMIT 10;

しかし、作成されたクエリプランでは、シーケンシャルスキャンが使用されていることがわかります。

                                                          QUERY PLAN                                                           

-------------------------------------------------------------------------------------------------------------------------------
Limit  (cost=0.00..65.18 rows=10 width=4) (actual time=62217.569..62217.569 rows=0 loops=1)
  Buffers: shared hit=262 read=447808
  I/O Timings: read=861.370
  ->  Seq Scan on locations  (cost=0.00..967615.99 rows=148442 width=2) (actual time=62217.567..62217.567 rows=0 loops=1)
         Filter: (to_tsvector('english'::regconfig, dl) @@ to_tsquery('Lymps'::text))
         Rows Removed by Filter: 29688342
         Buffers: shared hit=262 read=447808
         I/O Timings: read=861.370
Planning time: 0.109 ms
Execution time: 62217.584 ms

シーケンシャルスキャン強制OFF時

set enable_seqscan to off;

クエリプランで gin インデックスが使用されるようになりました。

                                                                  QUERY PLAN                                                               
---------------------------------------------------------------------------------------------------------------------------------------
 Limit  (cost=1382.43..1403.20 rows=10 width=2) (actual time=0.043..0.043 rows=0 loops=1)
   Buffers: shared hit=1 read=3
   ->  Bitmap Heap Scan on locations  (cost=1382.43..309697.73 rows=148442 width=2) (actual time=0.043..0.043 rows=0 loops=1)
         Recheck Cond: (to_tsvector('english'::regconfig, dl) @@ to_tsquery('Lymps'::text))
         Buffers: shared hit=1 read=3
         ->  Bitmap Index Scan on locations_test_idx  (cost=0.00..1345.32 rows=148442 width=0) (actual time=0.041..0.041 rows=0 loops=1)
               Index Cond: (to_tsvector('english'::regconfig, dl) @@ to_tsquery('Lymps'::text))
               Buffers: shared hit=1 read=3
 Planning time: 0.089 ms
 Execution time: 0.069 ms
(10 rows)

コスト設定を以下に貼り付けました。

select name,setting from pg_settings where name like '%cost';                       
         name         | setting 
----------------------+---------
 cpu_index_tuple_cost | 0.005
 cpu_operator_cost    | 0.0025
 cpu_tuple_cost       | 0.01
 random_page_cost     | 4
 seq_page_cost        | 1
(5 rows)

前述のクエリに順次スキャンを使用しないソリューションと、順次スキャンをオフに設定するなどのトリックを探しています。

の値を 20 に更新しようとしましたseq_page_costが、クエリプランは同じままでした。

score 1 · Accepted Answer

ここでの問題は、PostgreSQL が条件を満たす行が十分にあると考えているため、一致する行が 10 行になるまで順番に行をフェッチすることでコストを削減できると考えていることです。

しかし、条件を満たす行が 1 つもないため、クエリは最終的にテーブル全体をスキャンすることになりますが、インデックススキャンの方がはるかに高速です。

次のように、その列に対して収集された統計の品質を向上させることができます。

ALTER TABLE locations_test_idx
   ALTER to_tsvector SET STATISTICS 10000;

次にを実行するANALYZEと、PostgreSQL はその列のより良い統計を収集し、うまくいけばクエリプランが改善されます。

postgresql - postgresql 9.4 で GIN インデックス スキャンを強制する

1 に答える 1

Related

Reference

postgresql - postgresql 9.4 で GIN インデックススキャンを強制する