sql - SQL-転置間隔

Question

私はSQLの問題を解決しようとしています（これが可能かどうかさえわかりません）。説明してみましょう。

あるテーブルの日付（間隔）に基づくレコードの「範囲」を別のテーブルに転置し、この範囲をFROM/TO構造として保持します。

例として、次の開始テーブル構造があります。

ID DATE
100  11-08-2012
100  12-08-2012
100  13-08-2012
100  17-08-2012
100  18-08-2012
101  01-09-2012
...

結果として次のテーブルが必要です。

ID   FROM_DATE   TO_DATE
100  11-08-2012  13-08-2012
100  17-08-2012  18-08-2012
...

間隔はFROM/TOフィールドに保持され、単一の日付間隔の場合は、両方のフィールドに同じ日付が保持されます。

SQLを使用してこれを行う方法はありますか？

score 2 · Accepted Answer

これは、ROW_NUMBER（）をサポートする任意のデータベースで純粋なSQL（プロシージャやユーザー定義関数なし）を使用して非常に実行可能です。これは、SQLFiddleを使用したSQLSERVER2008の実装です。

-- Create a virtual table with 2 rows that is used to convert a single row
-- into 2 rows when the range is only a single day
with events as (
  select 'start' event 
  union all 
  select 'stop' event
),
-- Sort the data by date, partioning by ID, and assign a row number
sorted_dates as ( 
  select id, 
         dt, 
         row_number() over(partition by id order by dt) sorted_rownum
    from t
),
-- Find the dates that begin and end the ranges. Assign new row numbers
-- so that the START and STOP row numbers are always consecutive.
-- Convert a date that both starts and ends the range into two rows.
pruned_dates as (
  select d1.id, 
         e.event, 
         d1.dt,
         row_number() over(partition by d1.id order by d1.sorted_rownum, e.event) pruned_rownum
    from sorted_dates d1
    -- Look for a previous date that is the same day or 1 day earlier
    left outer join sorted_dates d0
      on d1.id=d0.id
     and d1.sorted_rownum  = d0.sorted_rownum+1
     and datediff(d, d0.dt, d1.dt)<=1
    -- Look for a next date that is the same day or 1 day later.
    left outer join sorted_dates d2
      on d1.id=d2.id
     and d1.sorted_rownum = d2.sorted_rownum-1
     and datediff(d, d1.dt, d2.dt)<=1
    -- Identify the record as a START date if there does not exist a prior date
    -- that is the same date or 1 day earlier.
    -- Identify the record as a STOP date if there does not exist a subsequent
    -- date that is the same date or 1 day later.
    left outer join events e
      on (d0.id is null and e.event='start')
      or (d2.id is null and e.event='stop')
   -- Ignore records that have not been identified as START or STOP records.
   where e.event is not null
)
-- Pair the START and STOP records and report the results
select d1.id,
       d1.dt from_date,
       d2.dt to_date
  from pruned_dates d1
  join pruned_dates d2
    on d1.id=d2.id
   and d1.pruned_rownum = d2.pruned_rownum-1
 where d1.event='start'
;

LEAD（）およびLAG（）をサポートするデータベースを使用すると、ソリューションはよりシンプルで効率的になります。これは、sqlfiddleを使用したSqlServer2012の実装です。

-- Create a virtual table with 2 rows that is used to convert a single row
-- into 2 rows when the range is only a single day
with events as(
  select 'start' event
  union all
  select 'stop' event
),
-- Use LAG() to get the previous date and LEAD() to get the next date.
-- The previous and/or next date may not exist, or it may be more than 
-- one day away.
dates as(
  select id,
         dt,
         lag(dt,1,'01/01/1900')  over(partition by id order by dt) prev_dt,
         lead(dt,1,'12/31/9999') over(partition by id order by dt) next_dt
    from t
),
-- Discard rows where both the previous and next dates are <= 1 day away.
-- Identify the remaining rows as either START or STOP.
-- Convert any date that both starts and stops a range into 2 rows.
-- For each remaining row, use LEAD() to get the subsequent remaining row.
-- At this point there are valid rows that have START in FROM and STOP in TO,
-- but also invalid rows that have STOP in FROM and NULL or START in TO. But
-- the invalid rows are required for LEAD() to give the correct value.
pruned_dates as(
  select id,
         event,
         dt from_date,
         lead(dt,1) over(partition by id order by dt, event) to_date
    from dates d
    join events e
      on (e.event='start' and datediff(d,prev_dt,dt)>1)
      or (e.event='stop'  and datediff(d,dt,next_dt)>1)
)
-- Filter out the unwanted rows, preserving the rows with START in FROM
-- and STOP in TO.
select id,
       from_date,
       to_date
  from pruned_dates
 where event='start'

score 0 · Accepted Answer

これがクエリで直接可能になるとは思いません。高級言語でコードを書くか、そのためのプロシージャを書く必要があります。

その場合、特定のIDの行を取得し、日付（？）で並べ替えて、結果の最初と最後の行を取得する必要があります。これで、にデータを入力し、このロジックFROM_DATEを使用できます。TO_DATE

score 0 · Accepted Answer

可能です。ネストされたクエリを使用する必要があります。

日付選択のロジックを説明する場合は、より良い例を示すことができます

例えば：

SELECT A.ID, MIN(date) as FROM_DATE, max(date) as TO_DATE FROM ( select ID, DATE FROM sourceTable) group by A.id

score 0 · Accepted Answer

まあ、これは機能しますが、少し混乱しています。

SELECT id, date1 AS 'StartDate', 
    MAX(CASE WHEN date < ISNULL(date2,'1/1/2050') THEN date END) AS 'EndDate'
FROM table1
JOIN (
    SELECT *
    FROM (
        SELECT ROW_NUMBER() OVER (ORDER BY t1.date) AS rn1, t1.id AS 'id1', t1.date AS 'date1' 
        FROM table1 t1
        LEFT OUTER JOIN table1 t2
            ON t1.id = t2.id AND DATEDIFF(dd,t1.date,t2.date) = -1
        WHERE t2.date IS NULL
        ) AS sub1

    LEFT OUTER JOIN (
        SELECT ROW_NUMBER() OVER (ORDER BY t1.date) AS rn2, t1.id AS 'id2', t1.date AS 'date2'
        FROM table1 t1
        LEFT OUTER JOIN table1 t2
            ON t1.id = t2.id AND DATEDIFF(dd,t1.date,t2.date) = -1
        WHERE t2.date IS NULL
        ) AS sub2 ON id1 = id2 AND rn1 = rn2 - 1

    ) AS sub ON id=id1
GROUP BY id, date1

基本的に、私はテーブルをそれ自体に結合し、対応する前の連続した日付がない日付のみを取得します。これにより、各範囲の開始日がわかります。次に、そのクエリをそれ自体に結合しますが、行番号-1で結合して、2番目の日付を1つオフセットするため、各開始日は次の開始日と同じ行になります。最後に、次の開始日よりも短い各開始日の最大日を見つけます。

テストテーブルを作成するためのコードは次のとおりです。その中にいくつかのデータを入れる必要があります：

CREATE TABLE [dbo].[table1](
    [pk] [int] IDENTITY(1,1) NOT NULL,
    [id] [varchar](10) NULL,
    [date] [datetime] NULL
) ON [PRIMARY]

score 0 · Accepted Answer

SQLServerを想定しています。

with 
    pairs(id,start,finish) as (
        select
            id1.Id as ID,id1.[Date] as start,id2.date as finish 
        from
            IdDate as id1 
            inner join IdDate id2 
            on id1.id=id2.id and DATEADD(DAY,1,id1.Date)=id2.date),
    starters(id,start) as (
        select
            pair1.id,pair1.start
        from
            pairs as pair1
        where
            pair1.start not in (select finish from pairs)),
    finishers(id,finish) as (
        select
            pair1.id,pair1.finish
        from 
            pairs as pair1
        where
            pair1.finish not in (select start from pairs))
select 
    s.id,s.start,finishers.finish 
from 
    starters as s, finishers 
where 
    finishers.finish > s.start and 
    (finishers.finish < (select MIN(start) from starters where start>s.start) or 
     (s.start=(select max(start) from starters) and 
      finishers.finish > (select MAX(start) from starters where start=s.start)))

入力

100 2012-08-11 00:00:00.000
100 2012-08-12 00:00:00.000
100 2012-08-13 00:00:00.000
100 2012-08-17 00:00:00.000
100 2012-08-18 00:00:00.000
100 2012-09-01 00:00:00.000
100 2012-09-02 00:00:00.000
100 2012-09-03 00:00:00.000
100 2012-09-04 00:00:00.000
100 2012-09-05 00:00:00.000

出力

id  start   finish
100 2012-08-11 00:00:00.000 2012-08-13 00:00:00.000
100 2012-08-17 00:00:00.000 2012-08-18 00:00:00.000
100 2012-09-01 00:00:00.000 2012-09-05 00:00:00.000

sql - SQL-転置間隔

5 に答える 5

Related

Reference