sql - SQL：時間枠全体でMAX値の変更を追跡する

Question

かなり複雑な新製品のいわゆる評価プロセスがあります。製品は、さまざまな地域およびアプリケーションで評価されます。各評価ステップの後、製品は新しい結果スコアを取得します。このスコア（0から10の間）は、製品がプロセスのどこまで進んでいるかを示します。各ステップで、同じか増加しますが、減少することはなく、不均一な数値は評価に失敗した製品を示します。最高スコアは製品ステータスと呼ばれます。

startDate（そのステータスを含む）で偶数のステータス（2,4,8,10）を持ち、時間枠のendDateでステータスを持っているすべての製品を選択したくありません。

（また、その時間枠でプロセスに入ったすべての新製品を選択したいのですが、2番目のステートメントで簡単に実行できると思います。）

私が抱えている問題は、両方の初期ステータスを出力に含める方法です。これが私のSQLステートメントです：

SELECT 
  MyTable.product_id, 
  MyTable.REGION, 
  MyTable.SEGMENT,
  Max(MyTable.result) AS NEW_STATUS
  FROM 
     MyTable INNER JOIN (
     SELECT
      product_id, 
      REGION, 
      SEGMENT, 
      Max(result) AS INITIAL_STATUS
    FROM
      MyTable
    WHERE
      DATE <= to_date(:startDate)
    GROUP BY
      product_id, REGION, SEGMENT
    HAVING
      Max(result) IN(2,4,8,10)
   ) initial_status ON MyTable.product_id = initial_status.product_id    
  WHERE
    MyTable.DATE <= to_date(:endDate)
  GROUP BY
    MyTable.product_id, 
    MyTable.REGION, 
    MyTable.SEGMENT;

max / group byに影響を与えずに、出力にinitial_statusを含めるにはどうすればよいですか？（オラクルですが、私は専門家ではないので、オラクル固有のものが役立つかもしれません）

編集：

データは1対多の関係にあります。1つの製品、多くの評価。各評価には、Region、segment、result、evaluation_date（およびここでは関係のない他のデータ）があります。ここでいくつかのサンプルデータを非正規化しました：

product_id    Region    Segment    Result    date
    1           US         AB         2    20.05.2012
    1           EU         TS         4    13.06.2012
    1           US         AB         4    01.09.2012
  234           US         AB         2    09.09.2012

日付範囲が2012年8月26日から2012年9月21日までの上記のサンプルの予想出力：

product_id    Region    Segment    Initial_Status    New_Status
    1            US        AB             2              4
    1            EU        TS             4              4 (this did not change)
  234            US        AB           (null)           2 ( new entry)

私の現在のSQLではそれを達成できないことを私は知っています。特に新しい値を表示します。

score 0 · Accepted Answer

ドキュメントのためだけに、次のクエリを思いつきました。最初の質問で尋ねられなかった特定の要件が含まれていることは知っています。それらのいくつかは、障害のあるデータを処理するためです。

SELECT 
  product_id, 
  REGION, 
  SEGMENT,
  initial_status,
  NEW_STATUS,
  "Comment",
  Count("Comment")  OVER (PARTITION BY 
      "Comment"
    ) "Counter"
from(
SELECT DISTINCT
  myTable.product_id, 
  myTable.REGION, 
  myTable.SEGMENT,
  initial_status.initial_status,
  Max(myTable.result) 
    OVER (PARTITION BY 
      myTable.product_id, 
      myTable.REGION, 
      myTable.SEGMENT
    ) NEW_STATUS,
  CASE WHEN initial_status.initial_status <> Max(myTable.result) 
    OVER (PARTITION BY 
      myTable.product_id, 
      myTable.REGION, 
      myTable.SEGMENT
    ) THEN 'Changed' ELSE 'Same' END as "Comment"  
  FROM 
    myTable INNER JOIN (
     SELECT
      product_id, 
      REGION, 
      SEGMENT, 
      Max(result) AS INITIAL_STATUS
    FROM
      myTable
    WHERE
      DATE <= to_date(:startDate)
      OR DATE is null
    GROUP BY
      product_id, REGION, SEGMENT
    HAVING
      Max(result) IN(2,4,8,10)
   ) initial_status 
    ON 
      myTable.product_id = initial_status.product_id
      AND myTable.REGION = initial_status.REGION
      AND (
        myTable.SEGMENT = initial_status.SEGMENT
        OR (myTable.SEGMENT is null AND initial_status.SEGMENT is null)
      )
  WHERE
    myTable.DATE <= to_date(:endDate)
UNION ALL
SELECT 
  myTable.product_id, 
  myTable.REGION, 
  myTable.SEGMENT,
  null AS initial_status,
  Max(myTable.result) 
    OVER (PARTITION BY 
      myTable.product_id, 
      myTable.REGION, 
      myTable.SEGMENT
    ) NEW_STATUS,
'New' As "Comment"
FROM myTable
WHERE evaluation_date BETWEEN to_date(:startDate) + 1 AND to_date(:endDate)
AND stage <> 'Stage 0')
ORDER BY
    product_id ASC;

score 0 · Accepted Answer

これは、サブクエリとUNIONセット操作で分析関数が必要なようです。分析関数の利点は、テーブルスキャンを 1 回実行するだけで済むことです。

startDate で偶数ステータス (2,4,8,10) のすべての製品を選択したい

これは：

select product_id, region, segment, initial_status, new_status
  from ( select product_id, region, segment, initial_status, date
                -- The maximum status over all time per product_id,
                -- region and segment
              , max(initial_status) over 
                   ( partition by product_id, region, segment ) as new_status
           from my_table
                )
       -- Restrict on where 
 where ( date <= to_date(:start_date, <format model>)
          -- If you only want even you can use mod
         and mod(initial_status, 2) = 0
             )
    or new_status = initial_status

その後、新しいものをすべて入手できます。

select product_id, region, segment, initial_status, new_status
  from ( select product_id, region, segment, initial_status
              , initial_status as new_status, date
                -- Minimum date this product_id, region, segment
                -- combination was entered
              , min(date) over 
                   ( partition by product_id, region, segment ) as min_date
                -- Find the most recent record for this combination
              , rank() over ( partition by product_id, region, segment
                                  order by date desc ) as rnk
           from my_table
                )
       -- By putting this condition in the outer-select
       -- you ensure you only get completely new records
 where min_date >= to_date(:startdate, <format_model>)
       -- If you have multiple records that were entered for a single pk
       -- between startdate and enddate you only want the most recent one.
   and rnk = 1

最後に、UNION を使用してこれらを一緒に追加できます。オーバーラップがないことを保証できる場合は、代わりに UNION ALL を使用してください。これは DISTINCT 操作を行わないため、クエリのパフォーマンスが向上します。

select query1
 union
select query2

これらを 1 つのクエリに結合することがおそらくどのように可能であるかに注意してください。見栄えはよくありませんが、おそらくより効率的です。

select product_id, region, segment, initial_status, new_status
  from ( select product_id, region, segment, initial_status
              , min(date) over 
                   ( partition by product_id, region, segment ) as min_date
              , rank() over ( partition by product_id, region, segment
                                  order by date desc ) as rnk
              , max(initial_status) over 
                   ( partition by product_id, region, segment ) as new_status
           from my_table
                )
 where ( min_date >= to_date(:startdate, <format_model>)
         and rnk = 1
             )
    or ( ( date <= to_date(:start_date, <format model>)
            and mod(initial_status, 2) = 0
                )
        or new_status = initial_status
           )

sql - SQL：時間枠全体でMAX値の変更を追跡する

2 に答える 2

Related

Reference