sql - T-SQL で特定のパターンを持つ段落から文を削除する

Question

それぞれ 5 ～ 20 文の説明が多数あります。前後に数字を含む単語を含む文を見つけて削除するスクリプトをまとめようとしています。

例の前: Hello world. 今日の部門には 345 人の従業員がいます。良い一日を過ごしてください。例の後: Hello world. 良い一日を過ごしてください。

現在の私の主な問題は、違反を特定することです。
ここで「345 人の従業員」が文を削除する原因となっています。ただし、説明ごとに番号が異なり、employee という単語のバリエーションも異なる可能性があります。従業員のさまざまなバリエーションすべてのテーブルを作成する必要はありません。

JTB

score 3 · Accepted Answer

これは良いSQLパズルになります。

免責事項：これを爆破するエッジケースはおそらくたくさんあります

これは、文字列を取得し、それを各文の行を持つテーブルに分割し、条件に一致する行を削除して、最後にそれらすべてを文字列に結合します。

CREATE FUNCTION dbo.fn_SplitRemoveJoin(@Val VARCHAR(2000), @FilterCond VARCHAR(100))
RETURNS VARCHAR(2000)
AS 
BEGIN
    DECLARE @tbl TABLE (rid INT IDENTITY(1,1), val VARCHAR(2000))
    DECLARE @t VARCHAR(2000)

    -- Split into table @tbl
    WHILE CHARINDEX('.',@Val) > 0
    BEGIN
        SET @t = LEFT(@Val, CHARINDEX('.', @Val))
        INSERT @tbl (val) VALUES (@t)
        SET @Val = RIGHT(@Val, LEN(@Val) - LEN(@t))
    END

    IF (LEN(@Val) > 0)
        INSERT @tbl VALUES (@Val)


    -- Filter out condition 
    DELETE FROM @tbl WHERE val LIKE @FilterCond

    -- Join back into 1 string
    DECLARE @i INT, @rv VARCHAR(2000)
    SET @i = 1
    WHILE @i <= (SELECT MAX(rid) FROM @tbl)
    BEGIN
        SELECT @rv = IsNull(@rv,'') + IsNull(val,'') FROM @tbl WHERE rid = @i
        SET @i = @i + 1
    END
    RETURN @rv

END
go


CREATE TABLE #TMP (rid INT IDENTITY(1,1), sentence VARCHAR(2000))
INSERT #tmp (sentence) VALUES ('Hello world. Todays department has 345 employees. Have a good day.')
INSERT #tmp (sentence) VALUES ('Hello world. Todays department has 15 emps. Have a good day. Oh and by the way there are 12 employees somewhere else')


SELECT 
    rid, sentence, dbo.fn_SplitRemoveJoin(sentence, '%[0-9] Emp%')
FROM #tmp t

戻り値

rid | sentence |  |
1 | Hello world. Todays department has 345 employees. Have a good day. | Hello world. Have a good day.|
2 | Hello world. Todays department has 15 emps. Have a good day. Oh and by the way there are 12 employees somewhere else | Hello world. Have a good day. |

score 2 · Accepted Answer

分割/削除/結合手法も使用しました。

主なポイントは次のとおりです。

これは、UDF ではなく、再帰的な CTEのペアを使用します。
これは、すべての英語の文末で機能します:.または!または?
これにより、空白が削除されて「数字と従業員」の比較が行われるため、複数のスペースなどについて心配する必要がなくなります。

SqlFiddle demoとコードは次のとおりです。

-- Split descriptions into sentences (could use period, exclamation point, or question mark)
-- Delete any sentences that, without whitespace, are like '%[0-9]employ%'
-- Join sentences back into descriptions
;with Splitter as (
    select ID
        , ltrim(rtrim(Data)) as Data
        , cast(null as varchar(max)) as Sentence
        , 0 as SentenceNumber
    from Descriptions -- Your table here
    union all
    select ID
        , case when Data like '%[.!?]%' then right(Data, len(Data) - patindex('%[.!?]%', Data)) else null end
        , case when Data like '%[.!?]%' then left(Data, patindex('%[.!?]%', Data)) else Data end
        , SentenceNumber + 1
    from Splitter
    where Data is not null
), Joiner as (
    select ID
        , cast('' as varchar(max)) as Data
        , 0 as SentenceNumber
    from Splitter
    group by ID
    union all
    select j.ID
        , j.Data +
            -- Don't want "digit+employ" sentences, remove whitespace to search
            case when replace(replace(replace(replace(s.Sentence, char(9), ''), char(10), ''), char(13), ''), char(32), '') like '%[0-9]employ%' then '' else s.Sentence end
        , s.SentenceNumber
    from Joiner j
        join Splitter s on j.ID = s.ID and s.SentenceNumber = j.SentenceNumber + 1
)
-- Final Select
select a.ID, a.Data
from Joiner a
    join (
        -- Only get max SentenceNumber
        select ID, max(SentenceNumber) as SentenceNumber
        from Joiner
        group by ID
    ) b on a.ID = b.ID and a.SentenceNumber = b.SentenceNumber
order by a.ID, a.SentenceNumber

score 0 · Accepted Answer

これを行う1つの方法。すべての文に 1 つの数字がある場合にのみ機能することに注意してください。

declare @d VARCHAR(1000) = 'Hello world. Todays department has 345 employees. Have a good day.'
declare @dr VARCHAR(1000)

set @dr = REVERSE(@d)

SELECT   REVERSE(RIGHT(@dr,LEN(@dr) - CHARINDEX('.',@dr,PATINDEX('%[0-9]%',@dr))))

 + RIGHT(@d,LEN(@d) - CHARINDEX('.',@d,PATINDEX('%[0-9]%',@d)) + 1)

sql - T-SQL で特定のパターンを持つ段落から文を削除する

3 に答える 3

Related

Reference