5

I have a table with a series of (IP varchar(15), DateTime datetime2) values. Each row corresponds to an HTTP request made by a user. I want to assign session numbers to these rows. Different IP-addresses have different session numbers. The same IP should be assigned a new session number if the last request is older than 30min. Here is a sample output:

IP,      DateTime,         SessionNumber, RequestNumber
1.1.1.1, 2012-01-01 00:01, 1,             1
1.1.1.1, 2012-01-01 00:02, 1,             2
1.1.1.1, 2012-01-01 00:03, 1,             3
1.1.1.2, 2012-01-01 00:04, 2,             1 --different IP => new session number
1.1.1.2, 2012-01-01 00:05, 2,             2
1.1.1.2, 2012-01-01 00:40, 3,             1 --same IP, but last request 35min ago (> 30min)

Columns 1 and 2 are inputs, 3 and 4 are the desired outputs. The table shows two users.

As the underlying is table is truely large, how can this be solved efficiently? I'd prefer a small constant amount of passes over the data (one or two).

4

2 に答える 2

8

Here's a couple of tries at it.

;WITH CTE1 AS
(
SELECT *,
IIF(DATEDIFF(MINUTE,
       LAG(DateTime) OVER (PARTITION BY IP ORDER BY DateTime),
       DateTime) < 30,0,1) AS SessionFlag
FROM Sessions
), CTE2 AS
(
SELECT *,
       SUM(SessionFlag) OVER (PARTITION BY IP 
                                  ORDER BY DateTime) AS IPSessionNumber
FROM CTE1
)
SELECT IP,
       DateTime,
       DENSE_RANK() OVER (ORDER BY IP, IPSessionNumber) AS SessionNumber,
       ROW_NUMBER() OVER (PARTITION BY IP, IPSessionNumber 
                              ORDER BY DateTime) AS RequestNumber
FROM CTE2

This has two sort operations (by IP, DateTime then by IP, IPSessionNumber) but does assume that the SessionNumber can be assigned arbitrarily as long as a different unique session number is assigned to each new session per the ip address/ 30 minute rule.

To assign the SessionNumbers sequentially in chronological order. I used the following.

;WITH CTE1 AS
(
SELECT *,
IIF(DATEDIFF(MINUTE,
       LAG(DateTime) OVER (PARTITION BY IP ORDER BY DateTime),
       DateTime) < 30,0,1) AS SessionFlag
FROM Sessions
), CTE2 AS(
SELECT *,
       SUM(SessionFlag) OVER (ORDER BY DateTime) AS GlobalSessionNo
FROM CTE1
), CTE3 AS(
SELECT *,
       MAX(CASE WHEN SessionFlag = 1 THEN GlobalSessionNo END) 
               OVER (PARTITION BY IP ORDER BY DateTime) AS SessionNumber
FROM CTE2)
SELECT IP,
       DateTime,
       SessionNumber,
       ROW_NUMBER() OVER (PARTITION BY SessionNumber 
                              ORDER BY DateTime) AS RequestNumber
FROM CTE3

This increases the number of sort operations to 4 however.

于 2012-04-08T19:13:41.263 に答える
2

Here is a version using a table variable and row_number to create a ID that can be used in a recursive CTE. It might be worthwhile to compare performance against the cursor and one query (provided by Martin) versions.

CREATE TABLE #T
(
  IP varchar(15),
  DateTime datetime,
  ID int,
  primary key (IP, ID)
)

insert into #T(IP, DateTime, ID)
select IP, DateTime, row_number() over(partition by IP order by DateTime) 
from #sessionRequests

;with C as
(
  select IP,
         ID,
         DateTime,
         1 as Session
  from #T
  where ID = 1
  union all 
  select T.IP,
         T.ID,
         T.DateTime,
         C.Session + case when datediff(minute, C.DateTime, T.DateTime) >= 30 then 1 else 0 end
  from #T as T
    inner join C 
      on T.IP = C.IP and
         T.ID = C.ID + 1
)
SELECT IP,
       DateTime,
       dense_rank() over(order by IP, Session) as SessionNumber,
       row_number() over(partition by IP, Session order by DateTime) as RequestNumber
from C
order by IP, DateTime, SessionNumber, RequestNumber
option (maxrecursion 0)
于 2012-04-09T07:28:05.953 に答える