sql - DATETIME計算を使用した非効率的なSQLクエリ。最適化する方法は？

Question

production_plan問題は、テーブルが各行の注文IDとその他の詳細をキャプチャする実際の環境に起因します。各行は、製品の生産が開始されたときと生産後に更新され、イベントのUTC時刻をキャプチャします。

生産ラインでいくつかの温度を収集する別のテーブルがありますtemperatures-UTCで保存されたものに関係なく、定期的に。

目標は、各製品の生産のために測定された温度のシーケンスを抽出することです。（次に、温度を処理する必要があります。値のチャートが作成され、監査目的で製品アイテムのドキュメントに添付されます。）

marc_sコメントの後に更新されました。元の質問では、インデックスは考慮されていませんでした。更新されたテキストは、次のことを考慮しています。コメントに記載されている元の測定値。

テーブルとインデックスは次の方法で作成されました。

CREATE TABLE production_plan (
        order_id nvarchar(50) NOT NULL,
        production_line uniqueidentifier NULL,
        prod_start DATETIME NULL,
        prod_end DATETIME NULL
);

-- About 31 000 rows inserted, ordered by order_id.
...

-- Clusteded index on ind_order_id.
CREATE CLUSTERED INDEX ind_order_id
ON production_plan (order_id ASC);

-- Non-clustered indices on the other columns.
CREATE INDEX ind_times
ON production_plan (production_line ASC, prod_start ASC, prod_end ASC);

------------------------------------------------------

-- There is actually more temperatures for one time (i.e. more
-- sensors). The UTC is the real time of the row insertion, hence
-- the primary key.
CREATE TABLE temperatures (
        UTC datetime PRIMARY KEY NOT NULL,
        production_line uniqueidentifier NULL,
        temperature_1 float NULL  
);

-- About 91 000 rows inserted ordered by UTC.
...

-- Clusteded index on UTC is created automatically 
-- because of the PRIMARY KEY. Indices on temperature(s)
-- do not make sense.

-- Non-clustered index for production_line
CREATE INDEX ind_pl
ON temperatures (production_line ASC);

-- The tables were created, records inserted, and the indices
-- created for less than 1 second (for the sample on my computer).

アイデアは、最初にproduction_line識別時にテーブルを結合し、次に温度UTC時間がアイテムの生産の開始/終了のUTC時間の間に収まるようにすることです。

-- About 45 000 rows in about 24 seconds when no indices were used.
-- The same took less than one second with the indices (for my data
-- and my computer).
SELECT pp.order_id,      -- not related to the problem 
       pp.prod_start,    -- UTC of the start of production
       pp.prod_end,      -- UTC of the end of production
       t.UTC,            -- UTC of the temperature measurement
       t.temperature_1   -- the measured temperature
  INTO result_table02
  FROM production_plan AS pp
       JOIN temperatures AS t
         ON pp.production_line = t.production_line
            AND t.UTC BETWEEN pp.prod_start
                          AND pp.prod_end
  ORDER BY t.UTC;

約24秒の時間は受け入れられませんでした。インデックスが必要だったことは明らかです。同じ操作にかかる時間は1秒未満でした（Microsoft SQL Management Studioの結果タブの下にある黄色の線の時間）。

でも...

2番目の問題は残っています

温度測定はあまり頻繁ではなく、測定場所は生産開始から少しずれているため、時間補正を行う必要があります。つまり、時間範囲の境界に2つのオフセットを追加する必要があります。私はこのようなクエリで終了しました：

-- About 46 000 rows in about 9 minutes without indices.
-- It took about the same also with indices 
-- (8:50 instead of 9:00 or so).
DECLARE @offset_start INT;
SET @offset_start = -60  -- one minute = one sample before

DECLARE @offset_end INT;
SET @offset_end = +60    -- one minute = one sample after

SELECT pp.order_id,      -- not related to the problem 
       pp.prod_start,    -- UTC of the start of production
       pp.prod_end,      -- UTC of the end of production
       t.UTC,            -- UTC of the temperature measurement
       t.temperature_1   -- the measured temperature
  INTO result_table03
  FROM production_plan AS pp
       JOIN temperatures AS t
         ON pp.production_line = t.production_line
            AND t.UTC BETWEEN DATEADD(second, @offset_start, pp.prod_start)
                          AND DATEADD(second, @offset_end, pp.prod_end)
  ORDER BY t.UTC;

計算では、DATEADD()インデックスが作成されたかどうかにほぼ関係なく、約9分かかります。

問題を解決する方法についてもっと考えると、修正された時間境界（オフセットが追加されたUTC）には、効率的な処理のために独自のインデックスが必要であるように思われます。一時的なテーブルを作成することが頭に浮かびます。次に、修正された列のインデックスを作成できます。その後、もう1つJOINを使用すると役立つはずです。その後、テーブルを削除できます。

一時テーブルの基本的な考え方は正しいですか？それを行うための他のテクニックはありますか？

あなたの提案をありがとう。ご提案いただいたインデックスのご紹介後、タイムリザルトを更新させていただきます。改善が見込まれる理由を教えてください。私はSQLソリューションを作成する際の実践的な経験に関する初心者です。

score 2 · Accepted Answer

通常、クエリは次の方法で最適化できます。

テーブルで適切なクラスタリングキーを選択する-良いことですnarrow, unique, static, ever-increasing。INT IDENTITYは古典的な良いキーです-GUIDはひどく悪い例です（過度のインデックスの断片化につながるため-詳細については、Kim TrippのGUIDをプライマリキーおよび/またはクラスタリングキーとしてお読みください）
子テーブルのすべての外部キー列にインデックスが付けられていることを確認して、JOINとルックアップがより高速に実行されるようにします。
本当に必要な数の列を選択します（これは問題なく実行されているようです）
クエリをカバーしようとしています。たとえば、必要なすべての列を持つ関連するテーブルにインデックスを作成します。インデックス列として直接、または含まれる列として（SQL Server 2008以降）
範囲クエリを高速化するため、および/または並べ替え/順序付けを支援するためにインデックスを追加する可能性があります

クエリとテーブル定義を確認します。

主キーが表示されないようです-それらを追加してください！
外部キーインデックスがオンになっていることを確認する必要があります（他のテーブルの主キーであるとpp.production_line想定）t.production_line
範囲クエリを処理するための適切なインデックスを見つけることができるかどうかを確認する必要がありますt.UTC
production_plan2すべての列を含むインデックスを作成することが理にかなっているかどうかを確認する必要があります（ order_id, pp.prod_start, pp.prod_end）
temperatures2すべての列を含むインデックスを作成することが理にかなっているかどうかを確認する必要があります（ UTC, temperature_1）

更新： SSMSツールバーからそのオプションを有効にすることで、実際の実行プランをキャプチャできます。

ここに画像の説明を入力してください

または下のメニューからQuery > Include Actual Execution Plan

score 1 · Accepted Answer

計算列は、 http：//msdn.microsoft.com/en-us/library/ms189292%28v=sql.105%29.aspxに役立ちます

ALTER TABLE production_plan ADD 
        offset_start int NOT NULL CONSTRAINT DF__production_plan__offset_start DEFAULT 0,
        offset_end int NOT NULL CONSTRAINT DF__production_plan__offset_end DEFAULT 0,
        prod_start_UTC as CAST(DATEADD(second,offset_start,prod_start) as DATETIME) PERSISTED  NOT NULL ,
        prod_end_UTC as CAST(DATEADD(second,offset_end,prod_end) as DATETIME) PERSISTED  NOT NULL

-- or just
--ALTER TABLE production_plan ADD 
--        prod_start_UTC as CAST(DATEADD(second,-60,prod_start) as DATETIME) PERSISTED  NOT NULL ,
--        prod_end_UTC as CAST(DATEADD(second,60,prod_end) as DATETIME) PERSISTED  NOT NULL

IF  EXISTS (SELECT * FROM sys.indexes WHERE object_id = OBJECT_ID(N'[dbo].[temperatures]') AND name = N'ind_pl')
    DROP INDEX [ind_pl] ON [dbo].[temperatures] WITH ( ONLINE = OFF )

CREATE INDEX ind_times_UTC
ON production_plan (production_line ASC, prod_start_UTC ASC, prod_end_UTC ASC);

SELECT pp.order_id,      -- not related to the problem 
       pp.prod_start,    -- UTC of the start of production
       pp.prod_end,      -- UTC of the end of production
       t.UTC,            -- UTC of the temperature measurement
       t.temperature_1   -- the measured temperature
  INTO result_table05
  FROM production_plan AS pp
       JOIN temperatures AS t
         ON pp.production_line = t.production_line
            AND t.UTC BETWEEN pp.prod_start_UTC
                          AND pp.prod_end_UTC
ORDER BY t.UTC;

marc_sによる推奨事項と同様に

score 1 · Accepted Answer

試すべきこと：

CREATE INDEX ind_pl
    ON temperatures (production_line ASC, UTC);

結合のカバーインデックスを提供します。

非等結合を適用すると（SQL Server 2005以降）、より高速になる可能性があります。

SELECT pp.order_id,      -- not related to the problem 
       pp.prod_start,    -- UTC of the start of production
       pp.prod_end,      -- UTC of the end of production
       t.UTC,            -- UTC of the temperature measurement
       t.temperature_1   -- the measured temperature
  INTO result_table02
  FROM production_plan AS pp
 CROSS APPLY
 (
   SELECT t1.utc, t1.temperature_1
     FROM temperatures AS t1
    WHERE t1.production_line = pp.production_line
      AND t1.UTC BETWEEN DATEADD(second, @offset_start, pp.prod_start)
                     AND DATEADD(second, @offset_end, pp.prod_end)
 ) t
 ORDER BY t.UTC;

これがうまくいかない場合、次のオプションは、pp用とt用の2つのカーソルを宣言し、一致を挿入しながら一度に片側を進めることによって、各テーブルが1回だけ読み取られるようにするストアドプロシージャを作成することです。一時テーブルに。n：mの関係があるため、この手法は非常に複雑になる可能性があります。しかし、上記がうまくいかない場合は、喜んで試してみます。

score 1 · Accepted Answer

一時テーブルを使用して次の解決策を試しました。

-- UTC range expanded by the offsets -- temporary table used.
-- (Much better -- less than one second.)

DECLARE @offset_start INT;
SET @offset_start = -60  -- one minute = one sample before

DECLARE @offset_end INT;
SET @offset_end = +60    -- one minute = one sample after

-- Temporary table with the production_plan UTC range expanded.
SELECT production_line,
       order_id,
       prod_start,
       prod_end,
       DATEADD(second, @offset_start, prod_start) AS start,
       DATEADD(second, @offset_end, prod_end) AS bend
  INTO #pp     
  FROM production_plan;

CREATE INDEX ind_UTC
  ON #pp (production_line ASC, start ASC, bend ASC);

SELECT order_id,
       prod_start,
       prod_end,
       UTC,
       temperature_1
  INTO result_table06
  FROM #pp JOIN temperatures AS t
             ON #pp.production_line = t.production_line
                AND UTC BETWEEN #pp.start AND #pp.bend
  ORDER BY UTC;

DROP TABLE #pp;

CREATE CLUSTERED INDEX ind_UTC
  ON result_table06 (UTC ASC);

結果は1秒未満で準備ができています（9分と比較してください）。しかし、私はあなたの批判を聞きたいです。1つの問題は、温度テーブルが大きなテーブルに成長した場合にどれだけ効率的になるかということです。

score 0 · Accepted Answer

これは2番目の問題です。

これのパフォーマンスは確認していませんが、定数フロートの加算と減算に置き換えることで、DATEADD関数をスキップしてみることができます。

あなたが秒を追加したい場合のようにあなたは使うことができます：

select getdate()+1.000/(24.00*60.00)

または定数を使用して：

select getdate()+0.000694444

ご覧のとおり、1つ追加すると正確に1日追加されます。したがって、これは正確に60秒ではありませんが、この場合は問題ではないでしょうか。

sql - DATETIME計算を使用した非効率的なSQLクエリ。最適化する方法は？

5 に答える 5

Related

Reference