sql - PostgreSQLのタイムスタンプに基づく移動平均

Question

タイムスタンプによる移動平均を実行したかった。温度とタイムスタンプ（日時）の2つの列があり、15分ごとの連続した温度観測に基づいて移動平均を実行したいと思います。つまり、15分の時間間隔に基づいて平均を実行するデータを選択します。さらに、異なる時系列に対して異なる数の観測値を持つことが可能です。すべてのウィンドウサイズが等しい（15分）ことを意味しましたが、各ウィンドウで異なる数の観測値を持つことが可能です。例：最初のウィンドウでは、n個の観測値の平均を計算する必要があり、2番目のウィンドウでは、n+5個の観測値の平均値を計算する必要があります。

データサンプル：

IDタイムスタンプ温度
1 2007-09-14 22:56:12 5.39
2 2007-09-14 22:58:12 5.34
3 2007-09-14 23:00:12 5.16
4 2007-09-14 23:02:12 5.54
5 2007-09-14 23:04:12 5.30
6 2007-09-14 23:06:12 5.20
7 2007-09-14 23:10:12 5.39
8 2007-09-14 23:12:12 5.34
9 2007-09-14 23:20:12 5.16
10 2007-09-14 23:24:12 5.54
11 2007-09-14 23:30:12 5.30
12 2007-09-14 23:33:12 5.20
13 2007-09-14 23:40:12 5.39
14 2007-09-14 23:42:12 5.34
15 2007-09-14 23:44:12 5.16
16 2007-09-14 23:50:12 5.54
17 2007-09-14 23:52:12 5.30
18 2007-09-14 23:57:12 5.20

主な課題：

サンプリング頻度が異なるために正確な15分の時間間隔がないのに、15分ごとに区別するコードをどのように学習できますか。

score 11 · Accepted Answer

テーブルをそれ自体と結合できます。

select l1.id, avg( l2.Temperature )
from l l1
inner join l l2 
   on l2.id <= l1.id and
      l2.Timestamps + interval '15 minutes' > l1.Timestamps
group by l1.id
order by id
;

結果：

| ID |            AVG |
-----------------------
|  1 |           5.39 |
|  2 |          5.365 |
|  3 | 5.296666666667 |
|  4 |         5.3575 |
|  5 |          5.346 |
|  6 | 5.321666666667 |
|  7 | 5.331428571429 |

注意: 「ハードワーク」のみが行われます。結果を元のテーブルと結合するか、新しい列をクエリに追加する必要があります。最終的なクエリが必要かどうかわかりません。この解決策を採用するか、さらに支援を求めてください。

score 9 · Accepted Answer

15 分間隔ごとにローリング平均を再開するとします。

select id, 
       temp,
       avg(temp) over (partition by group_nr order by time_read) as rolling_avg
from (       
  select id, 
         temp,
         time_read, 
         interval_group,
         id - row_number() over (partition by interval_group order by time_read) as group_nr
  from (
    select id, 
           time_read, 
           'epoch'::timestamp + '900 seconds'::interval * (extract(epoch from time_read)::int4 / 900) as interval_group,
           temp
    from readings
  ) t1
) t2
order by time_read;

これは、「時間範囲」でグループ化するDepesz のソリューションに基づいています。

SQLFiddle の例を次に示します: http://sqlfiddle.com/#!1/0f3f0/2

score 4 · Accepted Answer

この機能を利用して、集計関数をウィンドウ関数として使用するアプローチを次に示します。集計関数は、現在の現在の合計と共に、過去 15 分間の観測値を配列に保持します。状態遷移関数は、15 分のウィンドウに遅れた要素を配列からシフトし、最新の観測をプッシュします。最後の関数は、配列内の平均温度を単純に計算します。

さて、これがメリットになるかどうかは…次第です。データベースアクセス部分ではなく、postgresql の plgpsql 実行部分に焦点を当てています。私自身の経験では、plpgsql は高速ではありません。テーブルを簡単に参照して、観測ごとに過去 15 分間の行を見つけることができる場合は、(@danihp の回答のように) 自己結合がうまく機能します。ただし、このアプローチは、ルックアップが実用的でない、より複雑なソースからの観測に対処できます。いつものように、自分のシステムで試して比較してください。

-- based on using this table definition
create table observation(id int primary key, timestamps timestamp not null unique,
                         temperature numeric(5,2) not null);

-- note that I'm reusing the table structure as a type for the state here
create type rollavg_state as (memory observation[], total numeric(5,2));

create function rollavg_func(state rollavg_state, next_in observation) returns rollavg_state immutable language plpgsql as $$
declare
  cutoff timestamp;
  i int;
  updated_memory observation[];
begin
  raise debug 'rollavg_func: state=%, next_in=%', state, next_in;
  cutoff := next_in.timestamps - '15 minutes'::interval;
  i := array_lower(state.memory, 1);
  raise debug 'cutoff is %', cutoff;
  while i <= array_upper(state.memory, 1) and state.memory[i].timestamps < cutoff loop
    raise debug 'shifting %', state.memory[i].timestamps;
    i := i + 1;
    state.total := state.total - state.memory[i].temperature;
  end loop;
  state.memory := array_append(state.memory[i:array_upper(state.memory, 1)], next_in);
  state.total := coalesce(state.total, 0) + next_in.temperature;
  return state;
end
$$;

create function rollavg_output(state rollavg_state) returns float8 immutable language plpgsql as $$
begin
  raise debug 'rollavg_output: state=% len=%', state, array_length(state.memory, 1);
  if array_length(state.memory, 1) > 0 then
    return state.total / array_length(state.memory, 1);
  else
    return null;
  end if;
end
$$;

create aggregate rollavg(observation) (sfunc = rollavg_func, finalfunc = rollavg_output, stype = rollavg_state);

-- referring to just a table name means a tuple value of the row as a whole, whose type is the table type
-- the aggregate relies on inputs arriving in ascending timestamp order
select rollavg(observation) over (order by timestamps) from observation;

sql - PostgreSQLのタイムスタンプに基づく移動平均

3 に答える 3

Related

Reference