sql - 重複を削除する最も簡単な方法/説明

Question

最初に、私は（初心者として）テーブル内の重複に関するいくつかの Q & A を検索したことを述べたいと思いますが、残念ながら、回答として使用されているコードを操作できませんでした。

私のテーブルは、SQL Server 2008 で並べ替えられたレポートから作成されています。

重複したレコードを削除する方法と説明を知りたいです。

"MyTable":

Column1   (PK-auto incremental table's record ID) 
Column2   (some TXT) 
Column3   (Some TXT)
Column4   (SmallDateTime)
Column5   is empty

Column5 の値が保持されますSUM(count of deleted duplicates including this survived row)

場合によっては解決策の鍵は、同じ内容のレコードが複数ある場合[column2 and column3](したがって重複)、それらが常に同じ日付を共有するとは限らないことです ( column4)。

これから：

col1  col2   col3  col4         col5
----  -----  ----  -----------  ----
1     [abc]  [4]   [10/1/2012]  null
2     [abc]  [1]   [12/1/2012]  null
3     [ghi]  [6]   [4/1/2012]   null
4     [def]  [5]   [8/1/2012]   null
5     [abc]  [4]   [10/1/2012]  null
6     [def]  [5]   [12/1/2012]  null
7     [ghi]  [6]   [15/1/2012]  null
8     [abc]  [4]   [17/1/2012]  null
9     [ghi]  [6]   [6/1/2012]   null
10    [abc]  [1]   [13/1/2012]  null

これに：

col1  col2   col3  col4         col5
----  -----  ----  -----------  ----
8     [abc]  [4]   [17/1/2012]  2
10    [abc]  [1]   [13/1/2012]  3
6     [def]  [5]   [12/1/2012]  2
7     [ghi]  [6]   [15/1/2012]  3

つまり、複製されたすべてのレコードの表現として最新の (1) を残します。

++再編集++

Aaron Bertrand shawnt00 e2nburner...そして残りの皆さん、あなたの返信にどれほど感謝しているかは言えませんが、その大量のコードをまだ理解していません. 私は今、これらのコードをチェックするつもりですが、b4 ではありません。

私が最初にプログラミングを始めてSQLクエリが必要になったとき、使用した後

Select * From MyTable

... 最初の SQL ステートメント ...

私はSQLを知っていると言いました!!! .... さて ... 皆さんの深い知識を見てください ... どうもありがとうございました StackOverFlow のこの投稿が他の初心者にとってもさらに役立つことを知っています

score 2 · Accepted Answer

この回答では、共通のテーブル式を使用して、 row_number()と count() をデータの各「スライス」に適用します (つまり、col2 + col3 でグループ化されます)。count() は、そのような各グループに属する行の数を識別するために使用され、row_number() は、col4 desc (1 = グループごとに最新、2 = 2 番目に新しいなど) によって並べ替えられた「ランク」を適用するために使用されます。これは、col1 (一意の列のように見える) も使用して、関係を断ち切ります。CTE の後には、select、update、delete などのクエリを続けることができます。したがって、最初の select を実行して、これらが保持したい行であること、およびカウントが正しいことを検証できます。そうであれば、更新と削除を続行できます。どの場合でも、保持する行または破棄する行を識別するために、row_number() の出力が使用されることに気付くでしょう。

保持する行を特定するには:

;WITH n AS 
(
  SELECT col1, col2, col3, col4, 
    c = COUNT(*) OVER (PARTITION BY col2, col3),
    rn = ROW_NUMBER() OVER 
    (
      PARTITION BY col2, col3 ORDER BY col4 DESC, col1 DESC
    )
  FROM dbo.table_name
)
SELECT col1, col2, col3, col4, c
  FROM n WHERE rn = 1;

それらが保持したい行であることを確認したら、次のように更新できます。

;WITH n AS 
(
  SELECT col1, col2, col3, col4, col5, 
    c = COUNT(*) OVER (PARTITION BY col2, col3),
    rn = ROW_NUMBER() OVER 
    (
      PARTITION BY col2, col3 ORDER BY col4 DESC, col1 DESC
    )
  FROM dbo.table_name
)
UPDATE n SET col5 = c
  WHERE rn = 1;

次に、この方法で残りを削除します。

;WITH n AS 
(
  SELECT col1, col2, col3, col4, 
    rn = ROW_NUMBER() OVER 
    (
      PARTITION BY col2, col3 ORDER BY col4 DESC, col1 DESC
    )
  FROM dbo.table_name
)
DELETE n WHERE rn > 1;

または、さらに単純に (更新前に col5 が完全に null であると仮定します):

DELETE dbo.table_name WHERE col5 IS NULL;

score 1 · Accepted Answer

これは単純なアプローチです。あなたはmergeより良いことがわかるかもしれません。これらのバージョンでは、col1 の最大値が保持され、maxdate 列が変更されます。Aaron's は maxdate で行を保持します。それは私が重要だとは思わない違いですが、注意する必要があります.

update MyTable
set col4 = (
    select max(col4)
    from MyTable as m2
    where m2.col2 = MyTable.col2 and m2.col3 = MyTable.col3
),  col5 = (
    select count(*)
    from MyTable as m2
    where m2.col2 = MyTable.col2 and m2.col3 = MyTable.col3
)
where not exists (
    select *
    from MyTable as m2
    where
        m2.col2 = MyTable.col2 and m2.col3 = MyTable.col3
        and m2.col1 > MyTable.col1
        and m2.col4 > MyTable.col4 or m2.col4 = MyTable.col4 and m2.col1 > MyTable.col1
);

delete from MyTable
where exists (
    select *
    from MyTable as m2
    where
        m2.col2 = MyTable.col2 and m2.col3 = MyTable.col3
        and m2.col1 > MyTable.col1
);

EDIT 2mergeこれがクエリでの私のショットです

merge MyTable as target
using (
    select max(col1), col2, col3, max(col4), count(*)
    from Mytable
    group by col2, col3
) as source(id, col2, col3, maxdate, rowcount)
on (
        target.col1 = source.col1
    and target.col2 = target.col2
    and target.col3 = target.col3
)
when matched then
    update set col4 = maxdate, col5 = rowcount
when not matched then delete

EDIT 3元の maxdate で行を保持し、col1 の関係を解消します

-- option #1
update MyTable
set col5 = (
    select count(*)
    from MyTable as m2
    where m2.col2 = MyTable.col2 and m2.col3 = MyTable.col3
)
where not exists (
    select *
    from MyTable as m2
    where
        m2.col2 = MyTable.col2 and m2.col3 = MyTable.col3
        and m2.col4 > MyTable.col4 or m2.col4 = MyTable.col4 and m2.col1 > MyTable.col1
);

delete from MyTable
where exists (
    select *
    from MyTable as m2
    where
        m2.col2 = MyTable.col2 and m2.col3 = MyTable.col3
        and m2.col4 > MyTable.col4 or m2.col4 = MyTable.col4 and m2.col1 > MyTable.col1
);

-- option #2
merge MyTable as target
using (
    select max(col1), col2, col3, max(col4), count(*)
    from Mytable
    group by col2, col3
) as source(maxid, col2, col3, maxdate, rowcount)
on (
        target.col2 = target.col2
    and target.col3 = target.col3
    and target.col1 = maxid
    and target.col4 = maxdate
)
when matched then
    update set col5 = rowcount
when not matched then delete

score 0 · Accepted Answer

WITH a AS (
    SELECT  *,
            ROW_NUMBER() OVER (PARTITION BY colum2 ORDER BY colum3 desc) RowNum
    FROM    mytable
)
-- deleted rows will be:

delete from mytable
where [yourID] in

(SELECT [yourID]

FROM    a
WHERE   a.RowNum <> 1 )

sql - 重複を削除する最も簡単な方法/説明

3 に答える 3

Related

Reference