1

私の会社は、最初のレコードから最大 4 分後に作成されたデータベース内の重複レコードを受信して​​います。論理的に言えば、レコードのグループは、元のレコードと、その 4 分間の時間枠内に作成された後続のレコードで構成されます。最初のレコードは のTO_DELETE値を取得し'N'、各重複レコードは のTO_DELETE値を取得します'Y'。新しいグループはそれぞれ、'N'値からやり直します。

Deleting Invalid Duplicate Rows in SQLの助けを借りて、それらを選択するクエリをまとめましたが、2 時間以上実行されており、まだ結果セットを返していないため、無限ループに陥っているかどうかはわかりません. それに関する問題を特定する助けがあれば幸いです!

with LEAD_CTE as
(
    select *, ROW_NUMBER() over (partition by LASTNAME, FIRSTNAME, EMAIL, PRIMARY_PHONE, PROGRAMX, TERM_CODE, INQ_TYPE, LEADSOURCE order by CREATEDDATE) as ROWNUMBER
      from LEAD
     where DELETE_FLAG <> 'Y'
       and CREATEDDATE >= (GETDATE() - 7)
),
CTE as
(
    select ROWNUMBER, 'N' as TO_DELETE, CREATEDDATE, 0 as TOTAL_MINUTES, LASTNAME, FIRSTNAME, EMAIL, PRIMARY_PHONE, PROGRAMX, TERM_CODE, INQ_TYPE, LEADSOURCE
      from LEAD_CTE
     where ROWNUMBER = 1

     union all

    select l.ROWNUMBER,
           case when ((c.TOTAL_MINUTES + DATEDIFF(MINUTE, c.CREATEDDATE, l.CREATEDDATE)) > 4) then 'N' else 'Y' end as TO_DELETE,
           l.CREATEDDATE,
           case when ((c.TOTAL_MINUTES + DATEDIFF(MINUTE, c.CREATEDDATE, l.CREATEDDATE)) > 4) then 0 else (c.TOTAL_MINUTES + DATEDIFF(MINUTE, c.CREATEDDATE, l.CREATEDDATE)) end as TOTAL_MINUTES,
           l.EMAIL, l.FIRSTNAME, l.LASTNAME, l.PRIMARY_PHONE, l.PROGRAMX, l.TERM_CODE, l.INQ_TYPE, l.LEADSOURCE
      from LEAD_CTE l inner join CTE c on l.ROWNUMBER = (c.ROWNUMBER + 1)
)

  select ROWNUMBER, TO_DELETE, CREATEDDATE, TOTAL_MINUTES, LASTNAME, FIRSTNAME, EMAIL, PRIMARY_PHONE, PROGRAMX, TERM_CODE, INQ_TYPE, LEADSOURCE
    from CTE
order by LASTNAME, FIRSTNAME, EMAIL, PRIMARY_PHONE, PROGRAMX, TERM_CODE, INQ_TYPE, LEADSOURCE, CREATEDDATE

サンプルデータ:

CREATEDDATE             | LASTNAME  | FIRSTNAME | EMAIL         | PRIMARY_PHONE  | PROGRAMX               | TERM_CODE | INQ_TYPE | LEADSOURCE
---------------------------------------------------------------------------------------------------------------------------------------------
2013-09-24 00:06:01.000 | Testerson | Testy     | test@test.com | (123) 867-5309 | MS in Higher Education | NULL      | inquiry  | Webform
2013-09-24 00:18:47.000 | Testerson | Testy     | test@test.com | (123) 867-5309 | MS in Higher Education | NULL      | inquiry  | Webform
2013-09-24 00:18:50.000 | Testerson | Testy     | test@test.com | (123) 867-5309 | MS in Higher Education | NULL      | inquiry  | Webform
2013-09-24 00:18:52.000 | Testerson | Testy     | test@test.com | (123) 867-5309 | MS in Higher Education | NULL      | inquiry  | Webform
2013-09-24 00:18:52.000 | Testerson | Testy     | test@test.com | (123) 867-5309 | MS in Higher Education | NULL      | inquiry  | Webform
2013-09-24 00:18:54.000 | Testerson | Testy     | test@test.com | (123) 867-5309 | MS in Higher Education | NULL      | inquiry  | Webform
2013-09-24 00:18:55.000 | Testerson | Testy     | test@test.com | (123) 867-5309 | MS in Higher Education | NULL      | inquiry  | Webform
2013-09-24 00:18:56.000 | Testerson | Testy     | test@test.com | (123) 867-5309 | MS in Higher Education | NULL      | inquiry  | Webform
2013-09-24 00:18:56.000 | Testerson | Testy     | test@test.com | (123) 867-5309 | MS in Higher Education | NULL      | inquiry  | Webform

自己結合による新しい CTE:

with LEAD_CTE as
(
    select *, ROW_NUMBER() over (partition by LASTNAME, FIRSTNAME, EMAIL, PRIMARY_PHONE, PROGRAMX, TERM_CODE, INQ_TYPE, LEADSOURCE order by CREATEDDATE) as ROWNUMBER
      from LEAD
     where DELETE_FLAG <> 'Y'
       and CREATEDDATE >= (GETDATE() - 7)
)

  select l1.ROWNUMBER, l1.CREATEDDATE, l2.CREATEDDATE, DATEDIFF(MINUTE, l1.CREATEDDATE, l2.CREATEDDATE), l1.LASTNAME, l1.FIRSTNAME, l1.EMAIL, l1.PRIMARY_PHONE, l1.PROGRAMX, l1.TERM_CODE, l1.INQ_TYPE, l1.LEADSOURCE
    from LEAD_CTE l1 left join LEAD_CTE l2
      on l1.ROWNUMBER = (l2.ROWNUMBER + 1)
     and l1.LASTNAME = l2.LASTNAME
     and l1.FIRSTNAME = l2.FIRSTNAME
     and l1.EMAIL = l2.EMAIL
     and l1.PRIMARY_PHONE = l2.PRIMARY_PHONE
     and l1.PROGRAMX = l2.PROGRAMX
     and l1.TERM_CODE = l2.TERM_CODE
     and l1.INQ_TYPE = l2.INQ_TYPE
     and l1.LEADSOURCE = l2.LEADSOURCE
order by l1.ROWNUMBER

実際の出力:

ROWNUMBER | CREATEDDATE             | CREATEDDATE | (no column name) | LASTNAME  | FIRSTNAME | EMAIL         | PRIMARY_PHONE  | PROGRAMX               | TERM_CODE | INQ_TYPE | LEADSOURCE
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
1         | 2013-09-24 00:06:01.000 | NULL        | NULL             | Testerson | Testy     | test@test.com | (123) 867-5309 | MS in Higher Education | NULL      | inquiry  | Webform
2         | 2013-09-24 00:18:47.000 | NULL        | NULL             | Testerson | Testy     | test@test.com | (123) 867-5309 | MS in Higher Education | NULL      | inquiry  | Webform
3         | 2013-09-24 00:18:50.000 | NULL        | NULL             | Testerson | Testy     | test@test.com | (123) 867-5309 | MS in Higher Education | NULL      | inquiry  | Webform
4         | 2013-09-24 00:18:52.000 | NULL        | NULL             | Testerson | Testy     | test@test.com | (123) 867-5309 | MS in Higher Education | NULL      | inquiry  | Webform
5         | 2013-09-24 00:18:52.000 | NULL        | NULL             | Testerson | Testy     | test@test.com | (123) 867-5309 | MS in Higher Education | NULL      | inquiry  | Webform
6         | 2013-09-24 00:18:54.000 | NULL        | NULL             | Testerson | Testy     | test@test.com | (123) 867-5309 | MS in Higher Education | NULL      | inquiry  | Webform
7         | 2013-09-24 00:18:55.000 | NULL        | NULL             | Testerson | Testy     | test@test.com | (123) 867-5309 | MS in Higher Education | NULL      | inquiry  | Webform
8         | 2013-09-24 00:18:56.000 | NULL        | NULL             | Testerson | Testy     | test@test.com | (123) 867-5309 | MS in Higher Education | NULL      | inquiry  | Webform
9         | 2013-09-24 00:18:56.000 | NULL        | NULL             | Testerson | Testy     | test@test.com | (123) 867-5309 | MS in Higher Education | NULL      | inquiry  | Webform

興味深いのは、すべてのレコードのすべての l2 フィールドがとして入力されることです。これは、計算NULLの結果として返されることもわかりました。私の予想される出力は、最後のレコードの l2 フィールドを除いて、すべての l2 フィールドが次の l1 レコードの値を持つことです。DATEDIFF()NULLNULL

4

1 に答える 1

1

私はあなたが非常に近いと思います

   CASE WHEN Datediff(minute, l2.createddate, l1.createddate ) > 4
                  OR l2.createddate is null
                  THEN 'Y' ELSE 'N' END,

コメントで述べたように、null 許容フィールドへの参加は苦痛であるという事実に対処する必要があります

WITH lead_cte 
     AS (SELECT *, 
                Row_number() 
                  OVER ( 
                    partition BY lastname, firstname, email, primary_phone, 
                  programx, 
                  term_code, 
                  inq_type, leadsource 
                    ORDER BY createddate) AS ROWNUMBER 
         FROM   lead 
         WHERE  delete_flag <> 'Y' 
                AND createddate >= ( Getdate() - 7 )) 
SELECT l1.rownumber, 
       l1.createddate, 
       l2.createddate, 
       Datediff(minute, l2.createddate, l1.createddate ) ,
       CASE WHEN Datediff(minute, l2.createddate, l1.createddate ) > 4
                      OR l2.createddate is null
                      THEN 'Y' ELSE 'N' END,

       l1.lastname, 
       l1.firstname, 
       l1.email, 
       l1.primary_phone, 
       l1.programx, 
       l1.term_code, 
       l1.inq_type, 
       l1.leadsource 
FROM   lead_cte l1 
       LEFT JOIN lead_cte l2 
              ON l1.rownumber = l2.rownumber +1
                 AND l1.lastname = l2.lastname 
                 AND l1.firstname = l2.firstname 
                 AND l1.email = l2.email 
                 AND l1.primary_phone = l2.primary_phone 
                 AND l1.programx = l2.programx 
                 AND (l1.term_code = l2.term_code 
                       or ( l1.term_code is null and l2.term_code is null))
                 AND l1.inq_type = l2.inq_type 
                 AND l1.leadsource = l2.leadsource 
ORDER  BY l1.rownumber 

デモ

于 2013-09-25T21:42:49.197 に答える