sql - 一意の名前を生成する際のパフォーマンスの問題

Question

SQL Server DB に「オブジェクト」というテーブルがあります。オブジェクトの名前 (文字列) が含まれています。「Objects」テーブルの別のテーブル「NewObjects」に挿入する必要がある新しいオブジェクトの名前のリストがあります。以降、この操作を「インポート」と呼びます。

レコード名が「オブジェクト」に既に存在する場合、「NewObjects」から「オブジェクト」にインポートする各レコードに一意の名前を生成する必要があります。この新しい名前は、古い名前に対して「NewObjects」テーブルに保存されます。

DECLARE @NewObjects TABLE
(
    ...
    Name varchar(20),
    newName nvarchar(20)
)

「NewObjects」からインポートするレコードごとに一意の名前を生成するストアドプロシージャを実装しました。ただし、1000 レコード (「NewObjects」内) のパフォーマンスには満足できません。コードを最適化するための支援が必要です。以下は実装です。

PROCEDURE [dbo].[importWithNewNames] @args varchar(MAX)

-- Sample of @args is like 'A,B,C,D' (a CSV string)
...


DECLARE @NewObjects TABLE
(
    _index int identity PRIMARY KEY,
    Name varchar(20),
    newName nvarchar(20)
)

-- 'SplitString' function: this is a working implementation which is right now not concern of performance
INSERT INTO @NewObjects (Name)
SELECT * from SplitString(@args, ',')

declare @beg int = 1
declare @end int
DECLARE @oldName varchar(10)

-- get the count of the rows
select @end = MAX(_index) from @NewObjects

while @beg <= @end
BEGIN
    select @oldName = Name from @NewObjects where @beg = _index

    Declare @nameExists int = 0

    -- this is our constant. We cannot change
    DECLARE @MAX_NAME_WIDTH int = 5

    DECLARE @counter int = 1
    DECLARE @newName varchar(10)
    DECLARE @z varchar(10)

    select @nameExists = count(name) from Objects where name = @oldName
    ...
    IF @nameExists > 0
    BEGIN
        -- create name based on pattern 'Fxxxxx'. Example: 'F00001', 'F00002'.
        select @newName = 'F' + REPLACE(STR(@counter, @MAX_NAME_WIDTH, 0), ' ', '0')

        while EXISTS (select top 1 1 from Objects where name = @newName)
         OR EXISTS (select top 1 1 from @NewObjects where newName = @newName)
        BEGIN
            select @counter = @counter + 1
            select @newName = 'F' + REPLACE(STR(@counter, @MAX_NAME_WIDTH, 0), ' ', '0')
        END

        select top 1 @z = @newName from Objects

        update @NewObjects
        set newName = @z where @beg = _index
    END

    select @beg = @beg + 1
END

-- finally, show the new names generated
select * from @NewObjects

score 2 · Accepted Answer

免責事項: 私はこれらの推奨事項をテストする立場にないため、実装する際に独自に解決しなければならない構文エラーがある可能性があります。これらは、この手順を修正するためのガイドとしてだけでなく、将来のプロジェクトのためにスキルセットを成長させるのにも役立ちます。

ざっと目を通した最適化の 1 つです。これは、より大きなセットを反復処理するにつれてより一般的になります。次のコードは次のとおりです。

select @nameExists = count(name) from Objects where name = @oldName
...
IF @nameExists > 0

これを次のように変更することを検討してください。

IF EXISTS (select name from Objects where name = @oldName)

また、これを行うのではなく：

-- create name based on pattern 'Fxxxxx'. Example: 'F00001', 'F00002'.
select @newName = 'F' + REPLACE(STR(@counter, @MAX_NAME_WIDTH, 0), ' ', '0')

while EXISTS (select top 1 1 from Objects where name = @newName)
 OR EXISTS (select top 1 1 from @NewObjects where newName = @newName)
BEGIN
    select @counter = @counter + 1
    select @newName = 'F' + REPLACE(STR(@counter, @MAX_NAME_WIDTH, 0), ' ', '0')
END

このことを考慮：

DECLARE @maxName VARCHAR(20)
SET @newName = 'F' + REPLACE(STR(@counter, @MAX_NAME_WIDTH, 0), ' ', '0')

SELECT @maxName = MAX(name) FROM Objects WHERE name > @newName ORDER BY name
IF (@maxName IS NOT NULL)
BEGIN
    @counter = CAST(SUBSTRING(@maxName, 2) AS INT)
    SET @newName = 'F' + REPLACE(STR(@counter, @MAX_NAME_WIDTH, 0), ' ', '0')
END

これにより、生成された名前の最大整数値を見つけるためだけに、複数のクエリを繰り返したり実行したりしないことが保証されます。

さらに、私が持っているわずかなコンテキストに基づいて、前述のことを一度だけ実行する必要があることを保証する、もう 1 つの最適化を行うこともできるはずです。

DECLARE @maxName VARCHAR(20)
SET @newName = 'F' + REPLACE(STR(@counter, @MAX_NAME_WIDTH, 0), ' ', '0')

IF (@beg = 1)
BEGIN
    SELECT @maxName = MAX(name) FROM Objects WHERE name > @newName ORDER BY name
    IF (@maxName IS NOT NULL)
    BEGIN
        @counter = CAST(SUBSTRING(@maxName, 2) AS INT)
        SET @newName = 'F' + REPLACE(STR(@counter, @MAX_NAME_WIDTH, 0), ' ', '0')
    END
END

その最適化を行うことができると私が言う理由は、この間に他のエンティティがあなたと同じように見えるレコードを挿入することを心配する必要がない限り(Fxxxxx など)、MAX を 1 回見つけるだけでよく、単純に繰り返す @counterことができるからです。オーバーザループ。

実際、この部分全体をループから引き出すこともできそれをかなり簡単に推定できるはずです。内のコードと一緒にDECLAREとSETを引き出すだけです。しかし、一度に一歩を踏み出してください。@counterIF (@beg = 1)

また、次の行を変更します。

select top 1 @z = @newName from Objects

これに：

SET @z = @newName

文字通りSET2 つのローカル変数に対してクエリを実行しているためです。これは、パフォーマンスの問題の大きな原因である可能性があります。実際にstatementから変数を設定する場合を除きSELECTSET、ローカル変数の操作を使用することをお勧めします。これが適用されるコード内の他の場所がいくつかあります。次の行を検討してください。

select @beg = @beg + 1

代わりにこれを使用してください：

SET @beg = @beg + 1

最後に、上記の単純な繰り返し @counterに関して述べたように、ループの最後に次の行があります。

select @beg = @beg + 1

次の行を追加するだけです。

SET @counter = @counter + 1

そしてあなたは金色です！

要約すると、最大の競合する名前を一度だけ収集できるため、それらすべての反復を取り除くことができます。実際にテーブルにクエリを実行して 2 つのローカル変数を設定する場合SETのように、パフォーマンスが低下する行を取り除くために使用を開始します。select top 1 @z = @newName from Objectsそして、関数を利用してその作業を行うEXISTS変数を設定する代わりに、メソッドを利用します。AGGREGATECOUNT

これらの最適化がどのように機能するか教えてください。

score 1 · Accepted Answer

ループ内でのクエリは避ける必要があります。特に、これがテーブル変数にある場合は...

一時テーブルを使用して、このテーブルを newname 列にインデックス付けするようにしてください。少しは性能が上がるといいのですが..

しかし、クエリを内部に含むループを避けて、すべて書き直したほうがよいでしょう。

テスト用に環境を設定しています...

    --this would be your object table... I feed it with some values for test
    DECLARE @Objects TABLE
    (
        _index int identity PRIMARY KEY,
        Name varchar(20)

    )
    insert into @Objects(name)
    values('A'),('A1'),('B'),('F00001')

    --the parameter of your procedure
    declare @args varchar(MAX)
    set @args = 'A,B,C,D,F00001'

    --@NewObjects2 is your @NewObjects just named the n2 cause I did run your solution together when testing

    DECLARE @NewObjects2 TABLE
    (
        _index int identity PRIMARY KEY,
        Name varchar(20),
        newName nvarchar(20)
    )

    INSERT INTO @NewObjects2 (Name)
    SELECT * from SplitString(@args, ',')

    declare @end int
    select @end = MAX(_index) from @NewObjects2
    DECLARE @MAX_NAME_WIDTH int = 5

この時点で、ソリューションは非常に似ています

今、あなたのループの代わりに私は何をしますか

--generate newNames in format FXXXXX with free names sufficient to give newnames for all lines in @newObject
--you should alter this to get the greater FXXXXX name inside the Objects and start generate newNames from this point.. to avoid overhead creating newNames that will sure not to be used..
with N_free as 
(
     select 
         0 as [count],
         'F' + REPLACE(STR(0, @MAX_NAME_WIDTH, 0), ' ', '0') as [newName],
         0 as fl_free,
         0 as count_free

     union all 

     select 
         N.[count] + 1 as [count],
         'F' + REPLACE(STR(N.[count]+1, @MAX_NAME_WIDTH, 0), ' ', '0') as [newName],
         OA.fl_free,
         count_free + OA.fl_free as count_free
     from 
         N_free N
     outer apply 
         (select 
              case 
                 when not exists(select name from @Objects
                                 where Name = 'F' + REPLACE(STR(N.[count]+1, @MAX_NAME_WIDTH, 0), ' ', '0')) 
                    then 1 
                 else 0 
              end as fl_free) OA
    where 
        N.count_free < @end
)
--return only those newNames that are free to be used
    ,newNames as (select  ROW_NUMBER() over (order by [count]) as _index_name
                         ,[newName] 
                  from N_free where fl_free = 1
    )
--update the @NewObjects2 giving newname for the ones that got the name already been used on Objects
    update N2
    set newName = V2.[newName]
    from @NewObjects2 N2
    inner join (select V._index,V.Name,newNames.[newName]
                from(   select row_number() over (partition by case when O.Name is not null 
                                                                        then 1
                                                                        else 0
                                                        end 
                                                        order by N._index) as _index_name
                                  ,N._index
                                  ,N.Name
                                  ,case when O.Name is not null 
                                        then 1
                                        else 0
                                    end as [fl_need_newName]
                            from @NewObjects2 N
                            left outer join @Objects O
                            on O.Name = N.Name
                    )V
                    left outer join newNames 
                    on newNames._index_name = V._index_name
                    and V.fl_need_newName = 1
    )V2
    on V2._index = N2._index
            option(MAXRECURSION 0)

    select * from @NewObjects2

私が達成した結果は、このアンビエントにソリューションを使用した場合と同じでした...

これが本当に同じ結果を生成するかどうかを確認できます...

このクエリの結果は

    _index  Name    newName
        1   A       F00002
        2   B       F00003
        3   C       NULL
        4   D       NULL
        5   F00001  F00004

sql - 一意の名前を生成する際のパフォーマンスの問題

2 に答える 2

Related

Reference