sql - 共通の要素を取得するためにSQL関数を最適化する

Question

2つの区切られた文字列を受け取り、共通の要素の数を返す関数があります。The

関数のメインコードは次のとおりです（@intCountは期待される戻り値です）

    SET @commonCount = (select count(*) from (
    select token from dbo.splitString(@userKeywords, ';')
    intersect
    select token from dbo.splitString(@itemKeywords, ';')) as total)

ここで、splitStringはwhileループとcharIndexを使用して文字列を区切りトークンに分割し、テーブルに挿入します。

私が抱えている問題は、これが毎秒約100行の速度でしか処理されないことであり、データセットのサイズによっては、これが完了するまでに約8〜10日かかることです。

2つのストリングのサイズは、最大1500文字の長さにすることができます。

とにかく、これを使用できるほど速く達成できることはありますか？

score 1 · Accepted Answer

パフォーマンスの問題は、おそらくカーソル（whileループ用）とユーザー定義関数の組み合わせです。

これらの文字列の1つが一定の場合（アイテムのキーワードなど）、それぞれを個別に検索できます。

select *
from users u
where charindex(';'+<item1>+';', ';'+u.keywords) > 0
union all
select *
from users u
where charindex(';'+<item2>+';', ';'+u.keywords) > 0 union all

または、セットベースのアプローチでも機能しますが、データを正規化する必要があります（最初に正しい形式のデータを取得するには、ここにプラグインしてください）。つまり、次のようなテーブルが必要です。

userid
keyword

そしてもう一つ

itemid
keyword

（アイテムの種類が異なる場合。それ以外の場合、これは単なるキーワードのリストです。）

その場合、クエリは次のようになります。

select *
from userkeyword uk join
     itemkeyword ik
     on uk.keyword = ik.keyword

そして、SQLエンジンはその魔法を実行します。

では、どうすればそのようなリストを作成できますか？ユーザーごとにほんの一握りのキーワードしかない場合は、次のようなことができます。

with keyword1 as (select u.*, charindex(';', keywords) as pos1,
                         left(keywords, charindex(';', keywords)-1) as keyword1
                  from user u
                  where charindex(';', keywords) > 0
                 ),
     keyword2 as (select u.*, charindex(';', keywords, pos1+1) as pos2,
                         left(keywords, charindex(';', keywords)-1, pos1+1) as keyword2
                  from user u
                  where charindex(';', keywords, pos1+2) > 0
                 ),
        ...
select userid, keyword1
from keyword1
union all
select userid, keyword2
from keyword2
...

itemKeyWordsの要素の最大数を取得するには、次のクエリを使用できます。

select max(len(Keywords) - len(replace(Keywords, ';', '')))
from user

sql - 共通の要素を取得するためにSQL関数を最適化する

1 に答える 1

Related

Reference