postgresql - COPY でデータをインポートした後、PostgreSQL の全文検索でインデックスが使用されない

Question

全文検索を使った接頭辞検索でPostgreSQLにインデックスを使わせようとしています。一般的には正常に機能しますが、データをインポートした後にインデックスを作成した場合に限ります。多分これはある種の意図された動作ですが、私には理解できません。

最初にインデックスを作成し、次に COPY コマンドを使用してデータをインポートします。

CREATE INDEX account_fts_idx ON account
    USING gin(to_tsvector('german', remote_id || ' ' || name || ' ' || street || ' ' || zip || ' ' || city ));
COPY account (id, remote_id, name, street, zip, city ...) FROM '/path/account.csv' WITH DELIMITER ',' CSV;

次に、次の select ステートメントを使用して PREFIX (おそらく重要) 検索を実行します。

EXPLAIN ANALYZE SELECT a.id, a.remote_id, a.name, a.street, a.zip, a.city, al.latitude, al.longitude 
FROM account a 
LEFT JOIN account_location al ON al.id = a.id 
WHERE (to_tsvector('german', a.remote_id || ' ' || a.name || ' ' || a.street || ' ' || a.zip || ' ' || a.city) 
@@ (to_tsquery('german', 'hambu:*')))

インデックスが使用されていないため、パフォーマンスが低下します。

Hash Left Join  (cost=28.00..3389.97 rows=319 width=94) (actual time=1.685..1237.674 rows=1336 loops=1)
  Hash Cond: (a.id = al.id)
  ->  Seq Scan on account a  (cost=0.00..3360.73 rows=319 width=78) (actual time=1.665..1236.589 rows=1336 loops=1)
        Filter: (to_tsvector('german'::regconfig, (((((((((remote_id)::text || ' '::text) || (name)::text) || ' '::text) || (street)::text) || ' '::text) || (zip)::text) || ' '::text) || (city)::text)) @@ '''hambu'':*'::tsquery)
  ->  Hash  (cost=18.00..18.00 rows=800 width=24) (actual time=0.001..0.001 rows=0 loops=1)
        Buckets: 1024  Batches: 1  Memory Usage: 0kB
        ->  Seq Scan on account_location al  (cost=0.00..18.00 rows=800 width=24) (actual time=0.001..0.001 rows=0 loops=1)
Total runtime: 1237.928 ms

奇妙な部分があります。インデックスを削除して、同じ CREATE INDEX コマンドを使用して再作成すると、同じ SELECT クエリがインデックスを使用し、非常に高速になります。

Hash Left Join  (cost=61.92..1290.73 rows=1278 width=94) (actual time=0.561..1.918 rows=1336 loops=1)
  Hash Cond: (a.id = al.id)
  ->  Bitmap Heap Scan on account a  (cost=33.92..1257.78 rows=1278 width=78) (actual time=0.551..1.442 rows=1336 loops=1)
        Recheck Cond: (to_tsvector('german'::regconfig, (((((((((remote_id)::text || ' '::text) || (name)::text) || ' '::text) || (street)::text) || ' '::text) || (zip)::text) || ' '::text) || (city)::text)) @@ '''hambu'':*'::tsquery)
        ->  Bitmap Index Scan on account_fts_idx  (cost=0.00..33.60 rows=1278 width=0) (actual time=0.490..0.490 rows=1336 loops=1)
              Index Cond: (to_tsvector('german'::regconfig, (((((((((remote_id)::text || ' '::text) || (name)::text) || ' '::text) || (street)::text) || ' '::text) || (zip)::text) || ' '::text) || (city)::text)) @@ '''hambu'':*'::tsquery)
  ->  Hash  (cost=18.00..18.00 rows=800 width=24) (actual time=0.001..0.001 rows=0 loops=1)
        Buckets: 1024  Batches: 1  Memory Usage: 0kB
        ->  Seq Scan on account_location al  (cost=0.00..18.00 rows=800 width=24) (actual time=0.001..0.001 rows=0 loops=1)
Total runtime: 2.054 ms

では、なぜインポート後にインデックスを作成する必要があるのでしょうか?

そして、私にとってより重要なのは、新しい行 (通常は INSERT INTO を介して追加される) がインデックスに追加されるかどうかです。

score 2 · Accepted Answer

@Denisは私を正しい方向に向けました。VACUUM、ANALYZE コマンドを調べたところ、解決策が見つかりました。

GIN インデックスを持つテーブルの場合、VACUUM (任意の形式) は、保留中のインデックスエントリをメインの GIN インデックス構造の適切な場所に移動することにより、保留中のインデックスの挿入も完了します。( PostgreSQL ドキュメント: VACUUM )

SELECT クエリを実行VACUUM accountすると、期待どおりにインデックスが使用されます。

postgresql - COPY でデータをインポートした後、PostgreSQL の全文検索でインデックスが使用されない

1 に答える 1

Related

Reference