sql - row_number（）を使用して一意の列値を持つ行を返すtsqlビューのリファクタリング

Question

データを取得するために使用しているSQLビューがあります。それらを購入した顧客にリンクされている製品の膨大なリストを考えてみましょう。ビューは、リンクされている顧客の数に関係なく、製品ごとに1行のみを返す必要があります。これを実現するためにrow_number関数を使用しています。（この例は単純化されています。一般的な状況は、ある列Xの一意の値ごとに1つの行のみが返されるクエリです。どの行が返されるかは重要ではありません）

CREATE VIEW productView AS
SELECT * FROM 
    (SELECT 
        Row_number() OVER(PARTITION BY products.Id ORDER BY products.Id) AS product_numbering,
        customer.Id
        //various other columns
    FROM products
    LEFT OUTER JOIN customer ON customer.productId = prodcut.Id
    //various other joins
    ) as temp
WHERE temp.prodcut_numbering = 1

ここで、このビューの行の総数が最大100万であり、productViewからselect*を実行するのに10秒かかるとします。productID =10のproductViewからselect*などのクエリを実行すると、同じ時間がかかります。これは、クエリがこれに評価されるためだと思います

SELECT * FROM 
    (SELECT 
        Row_number() OVER(PARTITION BY products.Id ORDER BY products.Id) AS product_numbering,
        customer.Id
        //various other columns
    FROM products
    LEFT OUTER JOIN customer ON customer.productId = prodcut.Id
    //various other joins
    ) as temp
WHERE prodcut_numbering = 1 and prodcut.Id = 10

これにより、内部サブクエリが毎回完全に評価されるようになっていると思います。理想的には、次のようなものを使いたいです

SELECT 
    Row_number() OVER(PARTITION BY products.productID ORDER BY products.productID) AS product_numbering,
    customer.id
    //various other columns
FROM products
    LEFT OUTER JOIN customer ON customer.productId = prodcut.Id
    //various other joins
WHERE prodcut_numbering = 1

しかし、これは許可されていないようです。同様のことをする方法はありますか？

編集 -

多くの実験の後、私が抱えている実際の問題は、結合を強制的に1行だけ返す方法です。以下に示すように、アウターアプライを使用してみました。いくつかのサンプルコード。

CREATE TABLE Products (id int not null PRIMARY KEY)
CREATE TABLE Customers (
        id int not null PRIMARY KEY,
        productId int not null,
        value varchar(20) NOT NULL)

declare @count int = 1
while @count <= 150000
begin
        insert into Customers (id, productID, value)
        values (@count,@count/2, 'Value ' + cast(@count/2 as varchar))      
        insert into Products (id) 
        values (@count)
        SET @count = @count + 1
end

CREATE NONCLUSTERED INDEX productId ON Customers (productID ASC)

上記のサンプルセットを使用して、以下の「すべてを取得」クエリ

select * from Products
outer apply (select top 1 * 
            from Customers
            where Products.id = Customers.productID) Customers

実行には約1000msかかります。明示的な条件の追加：

select * from Products
outer apply (select top 1 * 
            from Customers
            where Products.id = Customers.productID) Customers
where Customers.value = 'Value 45872'

同じ時間がかかります。かなり単純なクエリのこの1000msはすでに多すぎて、同様の結合を追加するときに間違った方法（上向き）にスケーリングします。

score 3 · Accepted Answer

共通テーブル式 (CTE) を使用して、次のアプローチを試してください。提供したテストデータを使用して、特定の ProductIds を 1 秒以内に返します。

create view ProductTest as 

with cte as (
select 
    row_number() over (partition by p.id order by p.id) as RN, 
    c.*
from 
    Products p
    inner join Customers c
        on  p.id = c.productid
)

select * 
from cte
where RN = 1
go

select * from ProductTest where ProductId = 25

score 2 · Accepted Answer

次のようなことをしたらどうなるでしょうか:

SELECT ...
FROM products
OUTER APPLY (SELECT TOP 1 * from customer where customerid = products.buyerid) as customer
...

次に、productId のフィルターが役立ちます。ただし、フィルタリングしないとさらに悪化する可能性があります。

score 1 · Accepted Answer

問題は、データモデルに欠陥があることです。3つのテーブルが必要です。

顧客（customerId、...）
製品（productId、...）
ProductSales（customerId、productId）

さらに、販売テーブルはおそらく1対多（SalesとSalesDetails）に分割する必要があります。データモデルを修正しない限り、燻製ニシンの問題を追いかけて尻尾の周りを一周するだけです。システムが設計に合わない場合は、修正してください。上司があなたにそれを修正させない場合は、それを修正します。修正できない場合は修正してください。あなたが提案している悪いデータモデルを簡単に解決する方法はありません。

score 0 · Accepted Answer

どの顧客を連れ戻すかを本当に気にしないのであれば、これはおそらく十分に速いでしょう

select p1.*, c1.*
FROM products p1
Left Join (
        select p2.id, max( c2.id) max_customer_id
        From product p2
        Join customer c2 on
        c2.productID = p2.id
        group by 1
) product_max_customer
Left join customer c1 on
c1.id = product_max_customer.max_customer_id
;

sql - row_number（）を使用して一意の列値を持つ行を返すtsqlビューのリファクタリング

4 に答える 4

Related

Reference