sql - サブクエリを含むクエリのSQL「Where」句の最適化

Question

次の架空のデータ構造があるとしましょう。

create table "country"
(
  country_id integer,  
  country_name varchar(50),
  continent varchar(50),
  constraint country_pkey primary key (country_id)
);

create table "person"
(
  person_id integer,
  person_name varchar(100),
  country_id integer,
  constraint person_pkey primary key (person_id)
);

create table "event"
(
  event_id integer,
  event_desc varchar(100),
  country_id integer,
  constraint event_pkey primary key (event_id)
);

国ごとの人とイベントの行数を照会したいと思います。サブクエリを使用することにしました。

select c.country_name, sum(sub1.person_count) as person_count, sum(sub2.event_count) as event_count
from
  "country" c
  left join (select country_id, count(*) as person_count from "person" group by country_id) sub1
    on (c.country_id=sub1.country_id)
  left join (select country_id, count(*) as event_count from "event" group by country_id) sub2
    on (c.country_id=sub2.country_id)
group by c.country_name

フィールドリストでselectステートメントを使用してこれを実行できることは知っていますが、サブクエリを使用する利点は、SQLを変更して要約し、別のフィールドを使用する際の柔軟性が高いことです。大陸ごとに表示するようにクエリを変更すると、フィールド「c.country_name」を「c.continent」に置き換えるだけの簡単なものになります。

私の問題はフィルタリングに関するものです。このようにwhere句を追加すると、次のようになります。

select c.country_name, 
  sum(sub1.person_count) as person_count, 
  sum(sub2.event_count) as event_count
from
  "country" c
  left join (select country_id, count(*) as person_count from "person" group by country_id) sub1
    on (c.country_id=sub1.country_id)
  left join (select country_id, count(*) as event_count from "event" group by country_id) sub2
    on (c.country_id=sub2.country_id)
where c.country_name='UNITED STATES'
group by c.country_name

サブクエリはまだすべての国のカウントを実行しているようです。人物テーブルとイベントテーブルが巨大で、すべてのテーブルにcountry_idのインデックスがすでにあると仮定します。本当に遅いです。データベースは、フィルタリングされた国のサブクエリのみを実行するべきではありませんか？各サブクエリに対して国フィルターを再作成する必要がありますか（これは非常に面倒で、コードを簡単に変更することはできません）？ちなみにPostgreSQL8.3と9.0の両方を使っていますが、他のデータベースでも同じことが起こると思います。

score 2 · Accepted Answer

データベースは、フィルタリングされた国のサブクエリのみを実行するべきではありませんか？

いいえ。あなたのようなクエリの最初のステップは、FROM句のすべてのテーブルコンストラクタから作業テーブルを作成するように見えることです。その後、WHERE句が評価されます。

sub1とsub2の両方が副選択ではなく基本テーブルである場合、これをどのように行うか想像してみてください。どちらにも2つの列があり、country_idごとに1つの行があります。そして、すべての行を結合したい場合は、次のように記述します。

from
  "country" c
  left join sub1 on (c.country_id=sub1.country_id)
  left join sub2 on (c.country_id=sub2.country_id)

ただし、1行で参加したい場合は、これに相当するものを記述します。

from
  "country" c
  left join (select * from sub1 where country_id = ?)
    on (c.country_id=sub1.country_id)
  left join (select * from sub2 where country_id = ?)
    on (c.country_id=sub2.country_id)

初期のSQL標準の開発を支援したJoeCelkoは、SQLの評価順序がUsenetにどのように表示されるかについてよく書いています。

score 0 · Accepted Answer

country_idnotを使用して行をフィルタリング/グループ化できますcountry_nameか？名前のインデックスがないと思います。
すべてのテーブルをスキャンするため、サブクエリはインデックスを使用しません。スキャンを減らしたい場合は、データをフィルタリングする必要があります。

sql - サブクエリを含むクエリのSQL「Where」句の最適化

2 に答える 2

Related

Reference