postgresql-9.4 - PostgreSQL 9.4 クエリのチューニング

Question

実行速度が遅すぎるクエリがあります。

select c.vm_name,
       round(sum(bytes_sent)*1.8/power(10,9)) gb_sent,
       round(sum(bytes_received)*1.8/power(10,9)) gb_received
  from groups b, 
       vms c, 
       vm_ip_address_histories d, 
       ip_address_usage_histories e
 where b.group_id = c.group_id
   and c.vm_id = d.vm_id
   and d.ip_address = e.ip_address
   and e.datetime >= firstday()
   and d.allocation_date <= last_day(sysdate()) and (d.deallocation_date is null or d.deallocation_date > last_day(sysdate()))
   and b.customer_id = 29
 group by c.vm_name
 order by 1;

この関数sysdate()は、タイムゾーンなしで現在のシステムタイムスタンプをlast_day()返し、月の最終日を表すタイムスタンプを返します。これらを作成したのは、Hibernate が Postgres のキャスト表記を好まないためです。

問題は、プランナーがインデックスが配置されている場所で全テーブルスキャンを実行していることです。上記のクエリの実行計画は次のとおりです。

    QUERY PLAN                                                                                    
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Sort  (cost=1326387.13..1326391.38 rows=1698 width=24) (actual time=13221.041..13221.042 rows=7 loops=1)
   Sort Key: c.vm_name
   Sort Method: quicksort  Memory: 25kB
   ->  HashAggregate  (cost=1326236.61..1326296.04 rows=1698 width=24) (actual time=13221.008..13221.026 rows=7 loops=1)
         Group Key: c.vm_name
         ->  Hash Join  (cost=1309056.97..1325972.10 rows=35268 width=24) (actual time=13131.323..13211.612 rows=13631 loops=1)
               Hash Cond: (d.ip_address = e.ip_address)
               ->  Nested Loop  (cost=2.97..6942.24 rows=79 width=15) (actual time=0.249..56.904 rows=192 loops=1)
                     ->  Hash Join  (cost=2.69..41.02 rows=98 width=16) (actual time=0.066..0.638 rows=61 loops=1)
                           Hash Cond: (c.group_id = b.group_id)
                           ->  Seq Scan on vms c  (cost=0.00..30.98 rows=1698 width=24) (actual time=0.009..0.281 rows=1698 loops=1)
                           ->  Hash  (cost=2.65..2.65 rows=3 width=8) (actual time=0.014..0.014 rows=4 loops=1)
                                 Buckets: 1024  Batches: 1  Memory Usage: 1kB
                                 ->  Seq Scan on groups b  (cost=0.00..2.65 rows=3 width=8) (actual time=0.004..0.011 rows=4 loops=1)
                                       Filter: (customer_id = 29)
                                       Rows Removed by Filter: 49
                     ->  Index Scan using xif1vm_ip_address_histories on vm_ip_address_histories d  (cost=0.29..70.34 rows=8 width=15) (actual time=0.011..0.921 rows=3 loops=61)
                           Index Cond: (vm_id = c.vm_id)
                           Filter: ((allocation_date <= last_day(sysdate())) AND ((deallocation_date IS NULL) OR (deallocation_date > last_day(sysdate()))))
                           Rows Removed by Filter: 84
               ->  Hash  (cost=1280129.06..1280129.06 rows=1575435 width=23) (actual time=13130.223..13130.223 rows=203702 loops=1)
                     Buckets: 8192  Batches: 32  Memory Usage: 422kB
                     ->  Seq Scan on ip_address_usage_histories e  (cost=0.00..1280129.06 rows=1575435 width=23) (actual time=0.205..13002.776 rows=203702 loops=1)
                           Filter: (datetime >= firstday())
                           Rows Removed by Filter: 4522813
 Planning time: 0.804 ms
 Execution time: 13221.155 ms
(27 rows)

プランナは、最大のテーブルで非常にコストのかかる全テーブルスキャンを実行することを選択していることに注意してip_address_usage_historiesくださいvm_ip_address_histories。構成パラメーターenable_seqscanをオフに変更してみましたが、問題が悪化し、合計実行時間は 63 秒になりました。

前述のテーブルの説明は次のとおりです。

                             Table "ip_address_usage_histories"
           Column            |            Type             | Modifiers 
-----------------------------+-----------------------------+-----------
 ip_address_usage_history_id | bigint                      | not null
 datetime                    | timestamp without time zone | not null
 ip_address                  | inet                        | not null
 bytes_sent                  | bigint                      | not null
 bytes_received              | bigint                      | not null
Indexes:
    "ip_address_usage_histories_pkey" PRIMARY KEY, btree (ip_address_usage_history_id)
    "ip_address_usage_histories_datetime_ip_address_key" UNIQUE CONSTRAINT, btree (datetime, ip_address)
    "uk_mit6tbiu8k62vdae4tmtnwb3f" UNIQUE CONSTRAINT, btree (datetime, ip_address)

                          Table "vm_ip_address_histories"
          Column          |            Type             |                                         Modifiers                                          
--------------------------+-----------------------------+--------------------------------------------------------------------------------------------
 vm_ip_address_history_id | bigint                      | not null default nextval('vm_ip_address_histories_vm_ip_address_history_id_seq'::regclass)
 ip_address               | inet                        | not null
 allocation_date          | timestamp without time zone | not null
 deallocation_date        | timestamp without time zone | 
 vm_id                    | bigint                      | not null
Indexes:
    "vm_ip_address_histories_pkey" PRIMARY KEY, btree (vm_ip_address_history_id)
    "xie1vm_ip_address_histories" btree (replicate_date)
    "xif1vm_ip_address_histories" btree (vm_id)
Foreign-key constraints:
    "vm_ip_address_histories_vm_id_fkey" FOREIGN KEY (vm_id) REFERENCES vms(vm_id) ON DELETE RESTRICT

Postgres には、プランナーに指示するためのクエリヒントがないようです。from 句の構文も試しましたinner join ... on ...が、改善されませんでした。

更新 1

create or replace function firstday() returns timestamp without time zone as $$
begin
   return date_trunc('month',now()::timestamp without time zone)::timestamp without time zone;
end; $$
language plpgsql;

Postgres には月の初日を返す関数がないため、この関数を標準関数に置き換えようとはしていません。

score 0 · Accepted Answer

以下は質問に埋め込まれていましたが、回答として読みます。

すべての関数を不変に変更した後、クエリは 200 ミリ秒で実行されるようになりました。すべての正しいことが起こっています。

                              QUERY PLAN                                                                                                         
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 GroupAggregate  (cost=51865.24..51914.88 rows=1103 width=24) (actual time=178.793..188.223 rows=7 loops=1)
   Group Key: c.vm_name
   ->  Sort  (cost=51865.24..51868.00 rows=1103 width=24) (actual time=178.517..180.541 rows=13823 loops=1)
         Sort Key: c.vm_name
         Sort Method: quicksort  Memory: 1464kB
         ->  Hash Join  (cost=50289.49..51809.50 rows=1103 width=24) (actual time=131.278..155.971 rows=13823 loops=1)
               Hash Cond: (d.ip_address = e.ip_address)
               ->  Nested Loop  (cost=2.97..272.36 rows=23 width=15) (actual time=0.149..2.310 rows=192 loops=1)
                     ->  Hash Join  (cost=2.69..41.02 rows=98 width=16) (actual time=0.046..0.590 rows=61 loops=1)
                           Hash Cond: (c.group_id = b.group_id)
                           ->  Seq Scan on vms c  (cost=0.00..30.98 rows=1698 width=24) (actual time=0.006..0.250 rows=1698 loops=1)
                           ->  Hash  (cost=2.65..2.65 rows=3 width=8) (actual time=0.014..0.014 rows=4 loops=1)
                                 Buckets: 1024  Batches: 1  Memory Usage: 1kB
                                 ->  Seq Scan on groups b  (cost=0.00..2.65 rows=3 width=8) (actual time=0.004..0.012 rows=4 loops=1)
                                       Filter: (customer_id = 29)
                                       Rows Removed by Filter: 49
                     ->  Index Scan using xif1vm_ip_address_histories on vm_ip_address_histories d  (cost=0.29..2.34 rows=2 width=15) (actual time=0.002..0.027 rows=3 loops=61)
                           Index Cond: (vm_id = c.vm_id)
                           Filter: ((allocation_date <= '2015-03-31 00:00:00'::timestamp without time zone) AND ((deallocation_date IS NULL) OR (deallocation_date > '2015-03-31 00:00:00'::timestamp without time zone)))
                           Rows Removed by Filter: 84
               ->  Hash  (cost=46621.83..46621.83 rows=199575 width=23) (actual time=130.762..130.762 rows=206266 loops=1)
                     Buckets: 8192  Batches: 4  Memory Usage: 2818kB
                     ->  Bitmap Heap Scan on ip_address_usage_histories e  (cost=4627.14..46621.83 rows=199575 width=23) (actual time=18.335..69.763 rows=206266 loops=1)
                           Recheck Cond: (datetime >= '2015-03-01 00:00:00'::timestamp without time zone)
                           Heap Blocks: exact=3684
                           ->  Bitmap Index Scan on uk_mit6tbiu8k62vdae4tmtnwb3f  (cost=0.00..4577.24 rows=199575 width=0) (actual time=17.797..17.797 rows=206935 loops=1)
                                 Index Cond: (datetime >= '2015-03-01 00:00:00'::timestamp without time zone)
 Planning time: 0.837 ms
 Execution time: 188.301 ms
(29 rows)

プランナーが関数を実行し、その値を使用して where 句に挿入していることがわかります。これにより、インデックスが使用されます。

postgresql-9.4 - PostgreSQL 9.4 クエリのチューニング

1 に答える 1

Related

Reference