performance - この PostgreSQL クエリで「ハッシュ結合」と FTS を取得するのはなぜですか?

Question

次のシナリオを最適化しようとしています。

単語形式: 私は 2 つのテーブルを持っていalertsますuser_devices。にuser_devices結合されたデバイスuser_idが通知を取得するかどうかを追跡し、alertsテーブルでユーザーと通知者の関係を追跡します。基本的に、タスクはuser_id、アラートがあり、それに登録されている任意のデバイスへの通知を許可するものをすべて選択することです。

テーブル「アラート」、約 900k レコード:

               Table "public.alerts"
   Column    |           Type           | Modifiers 
-------------+--------------------------+-----------
 id          | uuid                     | not null
 user_id     | uuid                     | 
 target_id   | uuid                     | 
 target_type | text                     | 
 added_on    | timestamp with time zone | 
 old_id      | text                     | 
Indexes:
    "alerts_pkey" PRIMARY KEY, btree (id)
    "one_alert_per_business_per_user" UNIQUE CONSTRAINT, btree (user_id, target_id)
    "addedon" btree (added_on)
    "targetid" btree (target_id)
    "userid" btree (user_id)
    "userid_targetid" btree (user_id, target_id)
Foreign-key constraints:
    "alerts_user_id_fkey" FOREIGN KEY (user_id) REFERENCES users(id)

テーブル「user_devices」、約 12k レコード:

                Table "public.user_devices"
       Column        |           Type           | Modifiers 
---------------------+--------------------------+-----------
 id                  | uuid                     | not null
 user_id             | uuid                     | 
 device_id           | text                     | 
 device_token        | text                     | 
 push_notify_enabled | boolean                  | 
 device_type         | integer                  | 
 device_name         | text                     | 
 badge_count         | integer                  | 
 added_on            | timestamp with time zone | 
Indexes:
    "user_devices_pkey" PRIMARY KEY, btree (id)
    "push_notification" btree (push_notify_enabled)
    "user_id" btree (user_id)
    "user_id_push_notification" btree (user_id, push_notify_enabled)
Foreign-key constraints:
    "user_devices_user_id_fkey" FOREIGN KEY (user_id) REFERENCES users(id)

次のクエリ:

select COUNT(DISTINCT a.user_id) 
from alerts a 
  inner join user_devices ud on a.user_id = ud.user_id 
WHERE ud.push_notify_enabled = true;

約 3 秒かかり、次のプランが生成されます。

explain select COUNT(DISTINCT a.user_id) from alerts a inner join user_devices ud on a.user_id = ud.user_id WHERE ud.push_notify_enabled = true;
                                     QUERY PLAN                                     
------------------------------------------------------------------------------------
 Aggregate  (cost=49777.32..49777.33 rows=1 width=16)
   ->  Hash Join  (cost=34508.97..48239.63 rows=615074 width=16)
         Hash Cond: (ud.user_id = a.user_id)
         ->  Seq Scan on user_devices ud  (cost=0.00..480.75 rows=9202 width=16)
               Filter: push_notify_enabled
         ->  Hash  (cost=20572.32..20572.32 rows=801732 width=16)
               ->  Seq Scan on alerts a  (cost=0.00..20572.32 rows=801732 width=16)

これをスピードアップする方法はありますか?

ありがとうございました。

== 編集 ==

提案に従って、結合内で条件を移動しようとしましたが、違いはありません:

=> explain select COUNT(DISTINCT a.user_id) from alerts a inner join user_devices ud on a.user_id = ud.user_id and ud.push_notify_enabled;
                                     QUERY PLAN                                     
------------------------------------------------------------------------------------
 Aggregate  (cost=49777.32..49777.33 rows=1 width=16)
   ->  Hash Join  (cost=34508.97..48239.63 rows=615074 width=16)
         Hash Cond: (ud.user_id = a.user_id)
         ->  Seq Scan on user_devices ud  (cost=0.00..480.75 rows=9202 width=16)
               Filter: push_notify_enabled
         ->  Hash  (cost=20572.32..20572.32 rows=801732 width=16)
               ->  Seq Scan on alerts a  (cost=0.00..20572.32 rows=801732 width=16)

では、2 FTS から逃れる方法はありませんか? 少なくとも「アラート」テーブルのインデックスを何らかの形で使用することができれば、それは素晴らしいことでした..

== 編集 ==

「EXPLAIN ANALYZE」を追加。

=> explain ANALYZE select COUNT(DISTINCT a.user_id) from alerts a inner join user_devices ud on a.user_id = ud.user_id and ud.push_notify_enabled;
                                                             QUERY PLAN                                                              
-------------------------------------------------------------------------------------------------------------------------------------
 Aggregate  (cost=49777.32..49777.33 rows=1 width=16) (actual time=5254.355..5254.356 rows=1 loops=1)
   ->  Hash Join  (cost=34508.97..48239.63 rows=615074 width=16) (actual time=1824.607..2863.635 rows=614768 loops=1)
         Hash Cond: (ud.user_id = a.user_id)
         ->  Seq Scan on user_devices ud  (cost=0.00..480.75 rows=9202 width=16) (actual time=0.048..16.784 rows=9186 loops=1)
               Filter: push_notify_enabled
         ->  Hash  (cost=20572.32..20572.32 rows=801732 width=16) (actual time=1824.229..1824.229 rows=801765 loops=1)
               Buckets: 4096  Batches: 32  Memory Usage: 990kB
               ->  Seq Scan on alerts a  (cost=0.00..20572.32 rows=801732 width=16) (actual time=0.047..878.429 rows=801765 loops=1)
 Total runtime: 5255.427 ms
(9 rows)

=== 編集 ===

要求された構成を追加しています。そのほとんどは Ubuntu PG9.1 のデフォルトです。

/etc/postgresql/9.1/main# cat postgresql.conf | grep -e "work_mem" -e "effective_cache" -e "shared_buff" -e "random_page_c"
shared_buffers = 24MB           # min 128kB
#work_mem = 1MB             # min 64kB
#maintenance_work_mem = 16MB        # min 1MB
#wal_buffers = -1           # min 32kB, -1 sets based on shared_buffers
#random_page_cost = 4.0         # same scale as above
#effective_cache_size = 128MB

score 1 · Accepted Answer

インデックスを部分インデックスに置き換える:

DROP INDEX    user_id_push_notification ;
CREATE INDEX    user_id_push_notification ON user_devices (user_id)
 WHERE push_notify_enabled =True
 ;

、そして random_page_cost をより低い値に設定します:

SET random_page_cost = 1.1;

私に原因がありIndex Scan using push_notification on user_devices udました（<300ms）。YMMV。

800K/900K := 88%) 行が予想されるため、アラートの seqscan は多かれ少なかれ避けられないようです。インデックススキャンは、行サイズが非常に大きい場合にのみ有効です。

更新: ユーザーテーブルをクエリに追加すると、トリプルインデックススキャンが強制されるようです。(でもほぼ同時)

explain  ANALYZE
select COUNT(DISTINCT a.user_id)
from alerts a
join user_devices ud on a.user_id = ud.user_id
join users us on a.user_id = us.id
WHERE ud.push_notify_enabled = true;

score 1 · Accepted Answer

コメントで述べたように、本当の豚はalertsテーブルのフルスキャンです。論理的には、特定のユーザー IDについて、すべてのレコードalertsがそのユーザー ID と一致する可能性があります。

スキャンを制限する条件が 1 つありますpush_notify_enabled。の行は必要ありませんfalse。ただし、この列にはインデックスがないためalerts、2 つのテーブルを結合する最速の方法はフルスキャンです。

push_notify_enabledPostgres のバージョンがサポートしている場合は、でビットマップインデックスを使用してみてください。(明らかに、2 値列の btree インデックスは適切ではありません。)

クエリを高速化するにはalerts、でスキャンする行数を制限する必要があります。つまり、のインデックス付き列に条件を追加する必要がありますalerts。インデックスが十分に選択的である場合、フルスキャンの代わりにインデックススキャンが可能になる場合があります。

たとえば、意味がある場合は、ターゲット ID や日付関連の列でフィルター処理できます。

すべてアクティブで、ユーザー間で任意に共有できるアラートが 90 万件ある場合、選択の余地はほとんどありません。おそらくRAMを追加して、alertsテーブルを常にキャッシュしておくと役立つかもしれません。(ハードウェアの追加は、多くの場合、最も簡単で費用対効果の高いソリューションです。)

AFAICTプッシュ通知に関連付けられたアラートのみに関心があります。プッシュ通知を使用するユーザーが、プッシュ通知を使用しないユーザーとアラートを共有しない場合、事実上alerts、この条件によって分裂する可能性があります。

ビットマップインデックスがある場合は、push_notify_enabled列をに移動できますalerts。それ以外の場合は、 partitioningを使用して、その列で物理的に分割しようとする場合があります。プッシュ通知を含むアラートの数がアラートの総数よりも大幅に少ない場合alerts、参加するためにスキャンされるアラートの数ははるかに少なくなります。

performance - この PostgreSQL クエリで「ハッシュ結合」と FTS を取得するのはなぜですか?

2 に答える 2

Related

Reference