database - postgresql で group by を使用する方法

Question

目的は、2 つの異なるテーブルを使用してクエリを作成することです。国と都市。Country には name (国の) と country_code (主キー) が含まれ、city には name (市の)、population、および country_code (主キー) が含まれます。集計関数 GROUP BY を使用したいのですが、以下のクエリが機能しません。

国ごとに、その都市の最大人口とその都市の名前をリストします。そのため、各国の人口が最も多い都市をリストする必要があります。

したがって、表示する必要があるのは、国、都市 (人口が最も多い)、その都市の人口です。都市ごとに 1 つの国のみが存在する必要があります。

$query6 = "SELECT c.name AS country, ci.name AS city,
GREATEST(ci.population) AS max_pop
FROM lab6.country c INNER JOIN lab6.city ci
ON(c.country_code = ci.country_code)
GROUP BY c.name
ORDER BY country ASC";

GROUP BY country、DISTINCT c.name も試しました。

私は集計関数が初めてなので、GROUP BY を使用すると思われる特定の状況があり、これがその 1 つでない場合はお知らせください。

私はPHPを使用して次のようにクエリを実行しています:

$result = pg_query($connection, $query);
if(!$result)
{
       die("Failed to connect to database");
}

エラー: 列 "ci.name" は GROUP BY 句に指定するか、集計関数で使用する必要があります行 1: SELECT DISTINCT c.name AS country, ci.name AS city がエラーです。

テーブルは私たちに与えられたものであり、私たちは作成していません。作成したテーブルのスクリーンショットを含めることはできません。私には評判がないからです。

score 4 · Accepted Answer

遊ぶためのいくつかのDDL。

create table country (
  country_code char(2) primary key, -- ISO country code
  country_name varchar(35) not null unique
);

insert into country values 
('US', 'United States of America'),
('IT', 'Italy'),
('IN', 'India');

-- The full name of a city is more than city name plus country name.
-- In the US, there are a couple of dozen cities named Springfield,
-- each in a different state. I'd be surprised if this weren't true
-- in most countries.
create table city (
  country_code char(2) not null references country (country_code),
  name varchar(35) not null,
  population integer not null check (population > 0),
  primary key (country_code, name)
);

insert into city values 
('US', 'Rome, GA', 36303),
('US', 'Washington, DC', 632323),
('US', 'Springfield, VA', 30484),
('IT', 'Rome', 277979),
('IT', 'Milan', 1324110),
('IT', 'Bari', 320475),
('IN', 'Mumbai', 12478447),
('IN', 'Patna', 1683200),
('IN', 'Cuttack', 606007);

国内最大の人口。

select country.country_code, max(city.population) as max_population
from country
inner join city on country.country_code = city.country_code
group by country.country_code;

必要な結果を得るために、それを使用する方法はいくつかあります。1 つの方法は、共通テーブル式で内部結合を使用することです。

with max_population as (
  select country.country_code, max(city.population) as max_population
  from country
  inner join city on country.country_code = city.country_code
  group by country.country_code
)
select city.country_code, city.name, city.population
from city
inner join max_population 
        on max_population.country_code = city.country_code
       and max_population.max_population = city.population;

もう 1 つの方法は、サブクエリで内部結合を使用することです。(共通テーブル式のテキストは、メインクエリに「入ります」。別名「max_population」を使用すると、クエリを変更する必要がなくなります。)

select city.country_code, city.name, city.population
from city
inner join (select country.country_code, max(city.population) as max_population
            from country
            inner join city on country.country_code = city.country_code
            group by country.country_code
           ) max_population 
        on max_population.country_code = city.country_code
       and max_population.max_population = city.population;

さらに別の方法は、サブクエリでウィンドウ関数を使用することです。WHERE 句で rank() の結果を直接使用できないため、サブクエリから選択する必要があります。つまり、これは機能します。

select country_code, name, population
from (select country_code, name, population,
      rank() over (partition by country_code 
                   order by population desc) as city_population_rank
      from city
     ) city_population_rankings
where city_population_rank = 1;

しかし、これはそうではありません。

select country_code, name, population,
       rank() over (partition by country_code 
                    order by population desc) as city_population_rank
from city
where city_population_rank = 1;

ERROR:  column "city_population_rank" does not exist

score 0 · Accepted Answer

これを行う最善の方法は、最近のバージョンの PostgreSQL ではウィンドウを使用することです。( Docs .) 以前は、最大人口の行など、特別な行の他の列を最終出力に持ち込みたいときに、醜いことをする必要がありました。

WITH preliminary AS 
     (SELECT country_code, city_name, population,
      rank() OVER (PARTITION BY country_code ORDER BY population DESC) AS r
      FROM country
      NATURAL JOIN city) -- NATURAL JOIN collapses 2 country_code columns into 1
SELECT * FROM preliminary WHERE r=1;

これは、1 つの国の 2 つ以上の大都市の人口がまったく同じであるという、ありそうにないケースでも賢明な方法です。

[コメントに応じて編集]

ウィンドウ処理を行う前の私の通常のアプローチは

SELECT country_code, city_name, population
FROM country co1 NATURAL JOIN city ci1
WHERE ROW(co1.country_code, ci1.population) =
    (SELECT co2.country_code, ci2.population 
     FROM country co2 NATURAL JOIN city ci2
     WHERE co1.country_code = co2.country_code 
     ORDER BY population DESC LIMIT 1) 
     AS subquery;
-- note for lurkers, some other DBs use TOP 1 instead of LIMIT

DB がインテリジェントにインデックス化されている場合、Postgres がサブクエリを最適化するため、これのパフォーマンスはそれほど悪くありません。これを、Mike Sherrill の回答のサブクエリアプローチの内部結合と比較してください。

インストラクターの答えを教えてください。これまでに持っている機器では、効率が悪いか、引き分けの場合は不完全であるか、またはその両方になる可能性があります。

database - postgresql で group by を使用する方法

2 に答える 2

Related

Reference