sql - SQL を構造化する方法 - 列の値ごとに最初の X 行を選択しますか?

Question

次のタイプのデータを含むテーブルがあります。

create table store (
    n_id             serial not null primary key,
    n_place_id       integer not null references place(n_id),
    dt_modified      timestamp not null,
    t_tag            varchar(4),
    n_status         integer not null default 0
    ...
    (about 50 more fields)
);

n_id、n_place_id、dt_modified、および以下のクエリで使用される他のすべてのフィールドにインデックスがあります。

このテーブルには現在約 100,000 行が含まれていますが、100 万行またはそれ以上になる可能性があります。それでも、今のところ、100K マーク前後にとどまっていると仮定しましょう。

これらのテーブルから、1 つの 2 つの条件が満たされる行を選択しようとしています。

特定のサブセットにあるすべての行n_place_id(この部分は簡単です)。また
他のすべてのn_place_id値については、最初の 10 行がソートされますdt_modified(ここがより複雑になります)。

1 つの SQL でそれを行うのは面倒すぎるように思われるので、これにはストアド関数を使用することに満足しています。私は自分の関数を次のように定義しています：

create or replace function api2.fn_api_mobile_objects()
  returns setof store as
$body$
declare
    maxres_free integer := 10;
    resulter    store%rowtype;
    mcnt        integer := 0;
    previd      integer := 0;
begin
    create temporary table paid on commit drop as
    select n_place_id from payments where t_reference is not null and now()::date between dt_paid and dt_valid;

    for resulter in
        select * from store where n_status > 0 and t_tag is not null order by n_place_id, dt_modified desc
    loop
        if resulter.n_place_id in (select n_place_id from paid) then
            return next resulter;
        else
            if previd <> resulter.n_place_id then
                mcnt := 0;
                previd := resulter.n_place_id;
            end if;

            if mcnt < maxres_free then
                return next resulter;
                mcnt := mcnt + 1;
            end if;
        end if;
    end loop;
end;$body$
  language 'plpgsql' volatile;

問題はそれです

select * from api2.fn_api_mobile_objects()

実行には約 6 ～ 7 秒かかります。joinその後、この結果セットを他の 3 つのテーブルに追加する必要があり、追加の条件を適用し、さらに並べ替えを適用する必要があることを考えると、これは明らかに受け入れられません。

まあ、まだこのデータを取得する必要があるので、関数に何かが欠けているか、アルゴリズム全体を再考する必要があります。いずれにせよ、これについては助けが必要です。

score 1 · Accepted Answer

CREATE TABLE store
    ( n_id             serial not null primary key
    , n_place_id       integer not null -- references place(n_id)
    , dt_modified      timestamp not null
    , t_tag            varchar(4)
    , n_status         integer not null default 0
        );
INSERT INTO store(n_place_id,dt_modified,n_status)
SELECT n,d,n%4
FROM generate_series(1,100) n
, generate_series('2012-01-01'::date ,'2012-10-01'::date, '1 day'::interval ) d
        ;

WITH zzz AS (
        SELECT n_id AS n_id
        , rank() OVER (partition BY n_place_id ORDER BY dt_modified) AS rnk
        FROM store
        )
SELECT st.*
FROM store st
JOIN zzz ON zzz.n_id = st.n_id
WHERE st.n_place_id IN ( 1,22,333)
OR zzz.rnk <=10
        ;

更新：これはサブクエリと同じ自己結合構造です（CTEはプランナーによって少し異なる方法で処理されます）：

SELECT st.*
FROM store st
JOIN ( SELECT sx.n_id AS n_id
        , rank() OVER (partition BY sx.n_place_id ORDER BY sx.dt_modified) AS zrnk
        FROM store sx
        ) xxx ON xxx.n_id = st.n_id
WHERE st.n_place_id IN ( 1,22,333)
OR xxx.zrnk <=10
        ;

score 1 · Accepted Answer

苦労の末、保存された関数が結果を 1 秒強で返すようになりました (これは大きな改善です)。関数は次のようになります (追加の条件を追加しましたが、パフォーマンスにはあまり影響しませんでした)。

create or replace function api2.fn_api_mobile_objects(t_search varchar)
  returns setof store as
$body$
declare
    maxres_free integer := 10;
    resulter    store%rowtype;
    mid     integer := 0;
begin
    create temporary table paid on commit drop as
    select n_place_id from payments where t_reference is not null and now()::date between dt_paid and dt_valid
    union
    select n_place_id from store where n_status > 0 and t_tag is not null group by n_place_id having count(1) <= 10;

    for resulter in
        select * from store
        where n_status > 0 and t_tag is not null
        and (t_name ~* t_search or t_description ~* t_search)
        and n_place_id in (select n_place_id from paid)
    loop
        return next resulter;
    end loop;

    for mid in
        select distinct n_place_id from store where n_place_id not in (select n_place_id from paid)
    loop
        for resulter in
            select * from store where n_status > 0 and t_tag is not null and n_place_id = mid order by dt_modified desc limit maxres_free
        loop
            return next resulter;
        end loop;
    end loop;

end;$body$
  language 'plpgsql' volatile;

これは、私のローカルマシンでは 1 秒強で実行され、ライブでは約 0.8 ～ 1.0 秒で実行されます。データ量が増えるとどうなるかはわかりませんが、私の目的にはこれで十分です。

score 0 · Accepted Answer

簡単な提案として、私がこの種のトラブルシューティングを行うのが好きな方法は、ほとんどの方法でそこに到達するクエリを作成し、適切に最適化し、必要な pl/pgsql のものをその周りに追加することです。このアプローチの主な利点は、クエリプランに基づいて最適化できることです。

また、多くの行を扱っていない場合は、array_agg() と unnest() を使用すると (Pg 8.4 以降で!) 一時的なテーブル管理のオーバーヘッドを省き、単純に配列を構築してクエリすることができます。リレーションとしてのメモリ内のタプルの。一時テーブルではなくメモリ内の配列をヒットするだけの場合にも、パフォーマンスが向上する可能性があります (計画のオーバーヘッドとクエリのオーバーヘッドも少なくなります)。

また、更新されたクエリで、その最終ループをサブクエリまたは結合に置き換えることを検討します。これにより、プランナーは、ネストされたループルックアップをいつ実行するか、またはいつより良い方法を見つけようとするかを決定できます。

sql - SQL を構造化する方法 - 列の値ごとに最初の X 行を選択しますか?

3 に答える 3

Related

Reference