mysql - 累積（現在の合計？）列を追加する最適化？

Question

私はSQLを初めて使用し、このフォーラムはこれまでの私のライフラインでした。この素晴らしいプラットフォームで作成して共有していただきありがとうございます。

私は現在、大規模なデータセットに取り組んでおり、いくつかのガイダンスをいただければ幸いです。

データテーブル（existing_table）には400万行あり、次のようになります。

id  date   sales_a   sales_b   sales_c   sales_d   sales_e

同じ日付の行が複数あることに注意してください。

私がやりたいのは、このテーブルにさらに5つの列（、など）を追加することですcumulative_sales_a。この列にcumulative_sales_bは、特定の日付までのa、b、cなどの累積売上高が含まれます（これは日付ごとにグループ化されます）。これを行うために次のコードを使用しました。

create table new_cumulative  
select t.id, t.date, t.sales_a, t.sales_b, t.sales_c, t.sales_d, t.sales_e,   
(select sum(x.sales_a) from existing_table x where x.id = t.id and x.date <= t.date) as cumulative_sales_a,  
(select sum(x.sales_b) from existing_table x where x.id = t.id and x.date <= t.date) as cumulative_sales_b,  
(select sum(x.sales_c) from existing_table x where x.id = t.id and x.date <= t.date) as cumulative_sales_c,  
(select sum(x.sales_d) from existing_table x where x.id = t.id and x.date <= t.date) as cumulative_sales_d,  
(select sum(x.sales_e) from existing_table x where x.id = t.id and x.date <= t.date) as cumulative_sales_e  
from existing_table t  
group by t.id, t.date;

このクエリを実行する前に、列'id'にインデックスを作成しました。

目的の出力が得られましたが、このクエリが完了するまでに約11時間かかりました。

私はここで何か間違ったことをしているのか、そしてそのようなクエリを実行するためのより良い（そしてより速い）方法があるのかどうか疑問に思っていました。

ご協力ありがとうございました。

score 0 · Accepted Answer

一部のクエリは本質的にコストがかかり、実行に時間がかかります。この特定のケースでは、5つのサブクエリを回避できます。

SELECT a.*, b.cumulative_sales_a, b.cumulative_sales_b, ...
FROM 
(
 select t.id, t.`date`, t.sales_a, t.sales_b, t.sales_c, t.sales_d, t.sales_e
 from existing_table t  
 GROUP BY t.id,t.`date`
)a
LEFT JOIN 
(
  select x.id, x.date, sum(x.sales_a) as  cumulative_sales_a,
  sum(x.sales_b) as cumulative_sales_b, ...
  FROM existing_table x 
  GROUP BY x.id, x.`date`
)b ON (b.id = a.id AND b.`date` <=a.`date`)

コストのかかるクエリでもありますが、元のクエリよりも優れた実行プランが必要です。また、

select t.id, t.`date`, t.sales_a, t.sales_b, t.sales_c, t.sales_d, t.sales_e
 from existing_table t  
 GROUP BY t.id,t.`date`

必要なものを提供します。たとえば、同じIDと日付のレコードが5つある場合、これら5つのレコードのいずれかから他のフィールド（sales_a、sales_bなど）の値を取得します...

score 0 · Accepted Answer

1つのクエリですべてのミニ選択を合計で結合できます。

(select sum(x.sales_a) from existing_table x where x.id = t.id and x.date <= t.date) as cumulative_sales_a,  
(select sum(x.sales_b) from existing_table x where x.id = t.id and x.date <= t.date) as  cumulative_sales_b,  
(select sum(x.sales_c) from existing_table x where x.id = t.id and x.date <= t.date) as cumulative_sales_c,  
(select sum(x.sales_d) from existing_table x where x.id = t.id and x.date <= t.date) as cumulative_sales_d,  
(select sum(x.sales_e) from existing_table x where x.id = t.id and x.date <= t.date) as cumulative_sales_e

の

select sum(..),sum(..),sum(...),sum(..),sum(..)
from existing table x 
where x.id=t.id and x.date<=t.date

score 0 · Accepted Answer

MySQL変数のクエリに最適な場所のようです。この場合、予想される「ID」と「日付」ですべての集計を事前にクエリして、重複を削除し、1日の総計として1つのエントリを作成します。この結果を取得し、IDと日付順に並べて、「@sqlvariables」バージョンに結合する次の部分の準備をします。

ここで、それらを順番に処理し、新しいIDまで各IDを累積し続けてから、カウンターをゼロにリセットしますが、それぞれの「売上」を追加し続けます。各「レコード」が処理された後、@ lastIDを処理されたばかりのIDに設定して、次の行を処理するときに比較して、同じ人物で続行するかどうかを識別するか、強制的にゼロにリセットします。

内部の「PreAgg」regateクエリを最適化して確実にするために、（ID、Date）のインデックスを確保します。あなたにとって超高速である必要があります。

SELECT
      PreAgg.ID,
      PreAgg.`Date`,
      PreAgg.SalesA,
      PreAgg.SalesB,
      PreAgg.SalesC,
      PreAgg.SalesD,
      PreAgg.SalesE,
      @CumulativeA := if( @lastID := PreAgg.ID, @CumulativeA, 0 ) + PreAgg.SalesA as CumulativeA,
      @CumulativeB := if( @lastID := PreAgg.ID, @CumulativeB, 0 ) + PreAgg.SalesB as CumulativeB,
      @CumulativeC := if( @lastID := PreAgg.ID, @CumulativeC, 0 ) + PreAgg.SalesC as CumulativeC,
      @CumulativeD := if( @lastID := PreAgg.ID, @CumulativeD, 0 ) + PreAgg.SalesD as CumulativeD,
      @CumulativeE := if( @lastID := PreAgg.ID, @CumulativeE, 0 ) + PreAgg.SalesE as CumulativeE,
      @lastID := PreAgg.ID as dummyPlaceholder
   from 
      ( select 
              t.id, 
              t.`date`, 
              SUM( t.sales_a ) SalesA, 
              SUM( t.sales_b ) SalesB, 
              SUM( t.sales_c ) SalesC,
              SUM( t.sales_d ) SalesD,
              SUM( t.sales_e ) SalesE
           from
              existing_Table t
           group by
              t.id,
              t.`date`
           order by
              t.id,
              t.`date` ) PreAgg,
      ( select 
              @lastID := 0,
              @CumulativeA := 0,
              @CumulativeB := 0,
              @CumulativeC := 0,
              @CumulativeD := 0,
              @CumulativeE := 0 ) sqlvars

mysql - 累積（現在の合計？）列を追加する最適化？

3 に答える 3

Related

Reference