私の会社は、最初のレコードから最大 4 分後に作成されたデータベース内の重複レコードを受信しています。論理的に言えば、レコードのグループは、元のレコードと、その 4 分間の時間枠内に作成された後続のレコードで構成されます。最初のレコードは のTO_DELETE
値を取得し'N'
、各重複レコードは のTO_DELETE
値を取得します'Y'
。新しいグループはそれぞれ、'N'
値からやり直します。
Deleting Invalid Duplicate Rows in SQLの助けを借りて、それらを選択するクエリをまとめましたが、2 時間以上実行されており、まだ結果セットを返していないため、無限ループに陥っているかどうかはわかりません. それに関する問題を特定する助けがあれば幸いです!
with LEAD_CTE as
(
select *, ROW_NUMBER() over (partition by LASTNAME, FIRSTNAME, EMAIL, PRIMARY_PHONE, PROGRAMX, TERM_CODE, INQ_TYPE, LEADSOURCE order by CREATEDDATE) as ROWNUMBER
from LEAD
where DELETE_FLAG <> 'Y'
and CREATEDDATE >= (GETDATE() - 7)
),
CTE as
(
select ROWNUMBER, 'N' as TO_DELETE, CREATEDDATE, 0 as TOTAL_MINUTES, LASTNAME, FIRSTNAME, EMAIL, PRIMARY_PHONE, PROGRAMX, TERM_CODE, INQ_TYPE, LEADSOURCE
from LEAD_CTE
where ROWNUMBER = 1
union all
select l.ROWNUMBER,
case when ((c.TOTAL_MINUTES + DATEDIFF(MINUTE, c.CREATEDDATE, l.CREATEDDATE)) > 4) then 'N' else 'Y' end as TO_DELETE,
l.CREATEDDATE,
case when ((c.TOTAL_MINUTES + DATEDIFF(MINUTE, c.CREATEDDATE, l.CREATEDDATE)) > 4) then 0 else (c.TOTAL_MINUTES + DATEDIFF(MINUTE, c.CREATEDDATE, l.CREATEDDATE)) end as TOTAL_MINUTES,
l.EMAIL, l.FIRSTNAME, l.LASTNAME, l.PRIMARY_PHONE, l.PROGRAMX, l.TERM_CODE, l.INQ_TYPE, l.LEADSOURCE
from LEAD_CTE l inner join CTE c on l.ROWNUMBER = (c.ROWNUMBER + 1)
)
select ROWNUMBER, TO_DELETE, CREATEDDATE, TOTAL_MINUTES, LASTNAME, FIRSTNAME, EMAIL, PRIMARY_PHONE, PROGRAMX, TERM_CODE, INQ_TYPE, LEADSOURCE
from CTE
order by LASTNAME, FIRSTNAME, EMAIL, PRIMARY_PHONE, PROGRAMX, TERM_CODE, INQ_TYPE, LEADSOURCE, CREATEDDATE
サンプルデータ:
CREATEDDATE | LASTNAME | FIRSTNAME | EMAIL | PRIMARY_PHONE | PROGRAMX | TERM_CODE | INQ_TYPE | LEADSOURCE
---------------------------------------------------------------------------------------------------------------------------------------------
2013-09-24 00:06:01.000 | Testerson | Testy | test@test.com | (123) 867-5309 | MS in Higher Education | NULL | inquiry | Webform
2013-09-24 00:18:47.000 | Testerson | Testy | test@test.com | (123) 867-5309 | MS in Higher Education | NULL | inquiry | Webform
2013-09-24 00:18:50.000 | Testerson | Testy | test@test.com | (123) 867-5309 | MS in Higher Education | NULL | inquiry | Webform
2013-09-24 00:18:52.000 | Testerson | Testy | test@test.com | (123) 867-5309 | MS in Higher Education | NULL | inquiry | Webform
2013-09-24 00:18:52.000 | Testerson | Testy | test@test.com | (123) 867-5309 | MS in Higher Education | NULL | inquiry | Webform
2013-09-24 00:18:54.000 | Testerson | Testy | test@test.com | (123) 867-5309 | MS in Higher Education | NULL | inquiry | Webform
2013-09-24 00:18:55.000 | Testerson | Testy | test@test.com | (123) 867-5309 | MS in Higher Education | NULL | inquiry | Webform
2013-09-24 00:18:56.000 | Testerson | Testy | test@test.com | (123) 867-5309 | MS in Higher Education | NULL | inquiry | Webform
2013-09-24 00:18:56.000 | Testerson | Testy | test@test.com | (123) 867-5309 | MS in Higher Education | NULL | inquiry | Webform
自己結合による新しい CTE:
with LEAD_CTE as
(
select *, ROW_NUMBER() over (partition by LASTNAME, FIRSTNAME, EMAIL, PRIMARY_PHONE, PROGRAMX, TERM_CODE, INQ_TYPE, LEADSOURCE order by CREATEDDATE) as ROWNUMBER
from LEAD
where DELETE_FLAG <> 'Y'
and CREATEDDATE >= (GETDATE() - 7)
)
select l1.ROWNUMBER, l1.CREATEDDATE, l2.CREATEDDATE, DATEDIFF(MINUTE, l1.CREATEDDATE, l2.CREATEDDATE), l1.LASTNAME, l1.FIRSTNAME, l1.EMAIL, l1.PRIMARY_PHONE, l1.PROGRAMX, l1.TERM_CODE, l1.INQ_TYPE, l1.LEADSOURCE
from LEAD_CTE l1 left join LEAD_CTE l2
on l1.ROWNUMBER = (l2.ROWNUMBER + 1)
and l1.LASTNAME = l2.LASTNAME
and l1.FIRSTNAME = l2.FIRSTNAME
and l1.EMAIL = l2.EMAIL
and l1.PRIMARY_PHONE = l2.PRIMARY_PHONE
and l1.PROGRAMX = l2.PROGRAMX
and l1.TERM_CODE = l2.TERM_CODE
and l1.INQ_TYPE = l2.INQ_TYPE
and l1.LEADSOURCE = l2.LEADSOURCE
order by l1.ROWNUMBER
実際の出力:
ROWNUMBER | CREATEDDATE | CREATEDDATE | (no column name) | LASTNAME | FIRSTNAME | EMAIL | PRIMARY_PHONE | PROGRAMX | TERM_CODE | INQ_TYPE | LEADSOURCE
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
1 | 2013-09-24 00:06:01.000 | NULL | NULL | Testerson | Testy | test@test.com | (123) 867-5309 | MS in Higher Education | NULL | inquiry | Webform
2 | 2013-09-24 00:18:47.000 | NULL | NULL | Testerson | Testy | test@test.com | (123) 867-5309 | MS in Higher Education | NULL | inquiry | Webform
3 | 2013-09-24 00:18:50.000 | NULL | NULL | Testerson | Testy | test@test.com | (123) 867-5309 | MS in Higher Education | NULL | inquiry | Webform
4 | 2013-09-24 00:18:52.000 | NULL | NULL | Testerson | Testy | test@test.com | (123) 867-5309 | MS in Higher Education | NULL | inquiry | Webform
5 | 2013-09-24 00:18:52.000 | NULL | NULL | Testerson | Testy | test@test.com | (123) 867-5309 | MS in Higher Education | NULL | inquiry | Webform
6 | 2013-09-24 00:18:54.000 | NULL | NULL | Testerson | Testy | test@test.com | (123) 867-5309 | MS in Higher Education | NULL | inquiry | Webform
7 | 2013-09-24 00:18:55.000 | NULL | NULL | Testerson | Testy | test@test.com | (123) 867-5309 | MS in Higher Education | NULL | inquiry | Webform
8 | 2013-09-24 00:18:56.000 | NULL | NULL | Testerson | Testy | test@test.com | (123) 867-5309 | MS in Higher Education | NULL | inquiry | Webform
9 | 2013-09-24 00:18:56.000 | NULL | NULL | Testerson | Testy | test@test.com | (123) 867-5309 | MS in Higher Education | NULL | inquiry | Webform
興味深いのは、すべてのレコードのすべての l2 フィールドがとして入力されることです。これは、計算NULL
の結果として返されることもわかりました。私の予想される出力は、最後のレコードの l2 フィールドを除いて、すべての l2 フィールドが次の l1 レコードの値を持つことです。DATEDIFF()
NULL
NULL