sql - 自己結合、相互結合、およびグループ化

Question

いくつかの情報源から経時的な温度サンプルの表を取得しており、設定された時間間隔ですべての情報源の最小、最大、および平均温度を見つけたいと考えています。一見すると、これは次のように簡単に実行できます。

SELECT MIN(temp), MAX(temp), AVG(temp) FROM samples GROUP BY time;

ただし、問題の間隔中に欠落しているソースを無視するのではなく、欠落しているソースの最後の既知の温度を使用したい場合、ソースがドロップインおよびドロップアウトする場合、事態ははるかに複雑になります (私が困惑するところまで!)。サンプル。日時を使用し、時間の経過とともに不均一に分布するサンプル全体で間隔 (たとえば、1 分ごと) を構築すると、事態はさらに複雑になります。

最初のテーブルからの時間が2番目のテーブルの時間以上であるサンプルテーブルで自己結合を実行し、グループ化された行の集計値を計算することで、必要な結果を作成できるはずだと思いますソース。しかし、私は実際にこれを行う方法について困惑しています。

これが私のテストテーブルです：

+------+------+------+
| time   | source  | temp |
+------+------+------+
|    1 | a    |   20 | 
|    1 | b    |   18 | 
|    1 | c    |   23 | 
|    2 | b    |   21 | 
|    2 | c    |   20 | 
|    2 | a    |   18 | 
|    3 | a    |   16 | 
|    3 | c    |   13 | 
|    4 | c    |   15 | 
|    4 | a    |    4 | 
|    4 | b    |   31 | 
|    5 | b    |   10 | 
|    5 | c    |   16 | 
|    5 | a    |   22 | 
|    6 | a    |   18 | 
|    6 | b    |   17 | 
|    7 | a    |   20 | 
|    7 | b    |   19 | 
+------+------+------+
INSERT INTO samples (time, source, temp) VALUES (1, 'a', 20), (1, 'b', 18), (1, 'c', 23), (2, 'b', 21), (2, 'c', 20), (2, 'a', 18), (3, 'a', 16), (3, 'c', 13), (4, 'c', 15), (4, 'a', 4), (4, 'b', 31), (5, 'b', 10), (5, 'c', 16), (5, 'a', 22), (6, 'a', 18), (6, 'b', 17), (7, 'a', 20), (7, 'b', 19);

最小値、最大値、平均値の計算を行うには、次のような中間テーブルが必要です。

+------+------+------+
| time   | source  | temp |
+------+------+------+
|    1 | a    |   20 | 
|    1 | b    |   18 | 
|    1 | c    |   23 | 
|    2 | b    |   21 | 
|    2 | c    |   20 | 
|    2 | a    |   18 | 
|    3 | a    |   16 | 
|    3 | b    |   21 | 
|    3 | c    |   13 | 
|    4 | c    |   15 | 
|    4 | a    |    4 | 
|    4 | b    |   31 | 
|    5 | b    |   10 | 
|    5 | c    |   16 | 
|    5 | a    |   22 | 
|    6 | a    |   18 | 
|    6 | b    |   17 | 
|    6 | c    |   16 | 
|    7 | a    |   20 | 
|    7 | b    |   19 | 
|    7 | c    |   16 | 
+------+------+------+

次のクエリは、私が望むものに近づけていますが、指定された時間間隔での最新の結果ではなく、ソースの最初の結果の温度値を取ります:

SELECT s.dt as sdt, s.mac, ss.temp, MAX(ss.dt) as maxdt FROM (SELECT DISTINCT dt FROM samples) AS s CROSS JOIN samples AS ss WHERE s.dt >= ss.dt GROUP BY sdt, mac HAVING maxdt <= s.dt ORDER BY sdt ASC, maxdt ASC;

+------+------+------+-------+
| sdt  | mac  | temp | maxdt |
+------+------+------+-------+
|    1 | a    |   20 |     1 | 
|    1 | c    |   23 |     1 | 
|    1 | b    |   18 |     1 | 
|    2 | a    |   20 |     2 | 
|    2 | c    |   23 |     2 | 
|    2 | b    |   18 |     2 | 
|    3 | b    |   18 |     2 | 
|    3 | a    |   20 |     3 | 
|    3 | c    |   23 |     3 | 
|    4 | a    |   20 |     4 | 
|    4 | c    |   23 |     4 | 
|    4 | b    |   18 |     4 | 
|    5 | a    |   20 |     5 | 
|    5 | c    |   23 |     5 | 
|    5 | b    |   18 |     5 | 
|    6 | c    |   23 |     5 | 
|    6 | a    |   20 |     6 | 
|    6 | b    |   18 |     6 | 
|    7 | c    |   23 |     5 | 
|    7 | b    |   18 |     7 | 
|    7 | a    |   20 |     7 | 
+------+------+------+-------+

更新: chadhoc (ちなみに、素晴らしい名前です!) は、彼が使用するをサポートしていないため、残念ながら MySQL では機能しない優れたソリューションを提供しFULL JOINます。幸いなことに、単純なUNIONものが効果的な代替品であると私は信じています。

-- Unify the original samples with the missing values that we've calculated
(
  SELECT time, source, temp
  FROM samples
)
UNION
( -- Pull all the time/source combinations that we are missing from the sample set, along with the temp
  -- from the last sampled interval for the same time/source combination if we do not have one
  SELECT  a.time, a.source, (SELECT t2.temp FROM samples AS t2 WHERE t2.time < a.time AND t2.source = a.source ORDER BY t2.time DESC LIMIT 1) AS temp
  FROM    
  ( -- All values we want to get should be a cross of time/temp
    SELECT t1.time, s1.source
    FROM
    (SELECT DISTINCT time FROM samples) AS t1
    CROSS JOIN
    (SELECT DISTINCT source FROM samples) AS s1
  ) AS a
  LEFT JOIN samples s
  ON a.time = s.time
  AND a.source = s.source
  WHERE s.source IS NULL
)
ORDER BY time, source;

更新 2: MySQL はEXPLAINchadhoc のコードに対して次の出力を提供します。

+----+--------------------+------------+------+---------------+------+---------+------+------+-----------------------------+
| id | select_type        | table      | type | possible_keys | key  | key_len | ref  | rows | Extra                       |
+----+--------------------+------------+------+---------------+------+---------+------+------+-----------------------------+
|  1 | PRIMARY            | temp       | ALL  | NULL          | NULL | NULL    | NULL |   18 |                             | 
|  2 | UNION              | <derived4> | ALL  | NULL          | NULL | NULL    | NULL |   21 |                             | 
|  2 | UNION              | s          | ALL  | NULL          | NULL | NULL    | NULL |   18 | Using where                 | 
|  4 | DERIVED            | <derived6> | ALL  | NULL          | NULL | NULL    | NULL |    3 |                             | 
|  4 | DERIVED            | <derived5> | ALL  | NULL          | NULL | NULL    | NULL |    7 |                             | 
|  6 | DERIVED            | temp       | ALL  | NULL          | NULL | NULL    | NULL |   18 | Using temporary             | 
|  5 | DERIVED            | temp       | ALL  | NULL          | NULL | NULL    | NULL |   18 | Using temporary             | 
|  3 | DEPENDENT SUBQUERY | t2         | ALL  | NULL          | NULL | NULL    | NULL |   18 | Using where; Using filesort | 
| NULL | UNION RESULT       | <union1,2> | ALL  | NULL          | NULL | NULL    | NULL | NULL | Using filesort              | 
+----+--------------------+------------+------+---------------+------+---------+------+------+-----------------------------+

Charles のコードを次のように動作させることができました。

SELECT T.time, S.source,
  COALESCE(
    D.temp,
    (
      SELECT temp FROM samples
      WHERE source = S.source AND time = (
        SELECT MAX(time)
        FROM samples
        WHERE
          source = S.source
          AND time < T.time
      )
    )
  ) AS temp
FROM (SELECT DISTINCT time FROM samples) AS T
CROSS JOIN (SELECT DISTINCT source FROM samples) AS S
  LEFT JOIN samples AS D
ON D.source = S.source AND D.time = T.time

その説明は次のとおりです。

+----+--------------------+------------+------+---------------+------+---------+------+------+-----------------+
| id | select_type        | table      | type | possible_keys | key  | key_len | ref  | rows | Extra           |
+----+--------------------+------------+------+---------------+------+---------+------+------+-----------------+
|  1 | PRIMARY            | <derived5> | ALL  | NULL          | NULL | NULL    | NULL |    3 |                 | 
|  1 | PRIMARY            | <derived4> | ALL  | NULL          | NULL | NULL    | NULL |    7 |                 | 
|  1 | PRIMARY            | D          | ALL  | NULL          | NULL | NULL    | NULL |   18 |                 | 
|  5 | DERIVED            | temp       | ALL  | NULL          | NULL | NULL    | NULL |   18 | Using temporary | 
|  4 | DERIVED            | temp       | ALL  | NULL          | NULL | NULL    | NULL |   18 | Using temporary | 
|  2 | DEPENDENT SUBQUERY | temp       | ALL  | NULL          | NULL | NULL    | NULL |   18 | Using where     | 
|  3 | DEPENDENT SUBQUERY | temp       | ALL  | NULL          | NULL | NULL    | NULL |   18 | Using where     | 
+----+--------------------+------------+------+---------------+------+---------+------+------+-----------------+

score 1 · Accepted Answer

mySql のランキング/ウィンドウ関数を使用すると、パフォーマンスが向上すると思いますが、残念ながら、それらと TSQL の実装についてはわかりません。ただし、動作するANSI準拠のソリューションは次のとおりです。

-- Full join across the sample set and anything missing from the sample set, pulling the missing temp first if we do not have one
select  coalesce(c1.[time], c2.[time]) as dt, coalesce(c1.source, c2.source) as source, coalesce(c2.temp, c1.temp) as temp
from    samples c1
full join ( -- Pull all the time/source combinations that we are missing from the sample set, along with the temp
            -- from the last sampled interval for the same time/source combination if we do not have one
            select  a.time, a.source,
                    (select top 1 t2.temp from samples t2 where t2.time < a.time and t2.source = a.source order by t2.time desc) as temp
            from    
                (   -- All values we want to get should be a cross of time/samples
                    select t1.[time], s1.source
                    from
                    (select distinct [time] from samples) as t1
                    cross join
                    (select distinct source from samples) as s1
                ) a
            left join samples s
            on  a.[time] = s.time
            and a.source = s.source
            where s.source is null
        ) c2
on c1.time = c2.time
and c1.source = c2.source
order by dt, source

score 0 · Accepted Answer

これは複雑に見えることは知っていますが、それ自体を説明するようにフォーマットされています...動作するはずです...ソースが3つしかないことを願っています...これよりも任意の数のソースがある場合は動作しません...その場合2番目のクエリを参照してください...編集：最初の試行を削除しました

編集：事前にソースがわからない場合は、欠落している値を「埋める」中間結果セットを作成する何かを行う必要があります。次のようなものです。

2番目の編集：Select句からJoin条件に各ソースの最新の一時読み取り値を取得するロジックを移動することにより、Coalesceの必要性を削除しました。

Select T.Time, Max(Temp) MaxTemp,
  Min(Temp) MinTemp, Avg(Temp) AvgTemp
From
  (Select T.TIme, S.Source, D.Temp
   From (Select Distinct Time From Samples) T
     Cross Join 
        (Select Distinct Source From Samples) S
     Left Join Samples D
        On D.Source = S.Source
           And D.Time = 
               (Select Max(Time)
                From Samples
                Where Source = S.Source
                   And Time <= T.Time)) Z
Group By T.Time

sql - 自己結合、相互結合、およびグループ化

2 に答える 2

Related

Reference