matlab - MATLAB: 時系列の各 1 分間隔の平均を計算する

Question

タイムスタンプベクトル (秒単位) と測定された値のベクトルの 2 つのコンポーネントでそれぞれ記述された一連の時系列があります。時間ベクトルは不均一です (つまり、不規則な間隔でサンプリングされます)。

値の1分間隔ごとの平均/ SDを計算しようとしています（X分の間隔を取り、その平均を計算し、次の間隔を取ります...）。

私の現在の実装ではループを使用しています。これは私がこれまでに持っているもののサンプルです:

t = (100:999)' + rand(900,1);       %' non-uniform time
x = 5*rand(900,1) + 10;             % x(i) is the value at time t(i)

interval = 1;         % 1-min interval
tt = ( floor(t(1)):interval*60:ceil(t(end)) )';  %' stopping points of each interval
N = length(tt)-1;

mu = zeros(N,1);
sd = zeros(N,1);

for i=1:N
    indices = ( tt(i) <= t & t < tt(i+1) ); % find t between tt(i) and tt(i+1)
    mu(i) = mean( x(indices) );
    sd(i) = std( x(indices) );
end

より高速なベクトル化されたソリューションがあるかどうか疑問に思っています。上記のサンプルよりもはるかに長い時間をかけて処理する多数の時系列があるため、これは重要です..

どんな助けでも大歓迎です。

フィードバックをありがとうございました。

t常に単調増加 (ソート) されるように生成される方法を修正しましたが、これは実際には問題ではありませんでした..

また、これを明確に述べていないかもしれませんが、私の意図は、任意の間隔の長さを分単位で解決することでした (1 分は単なる例です)。

score 11 · Accepted Answer

唯一の論理的な解決策は...

Ok。私にとって論理的な解決策は 1 つしかないのに、他の多くの人が別の解決策を見つけているのはおかしいと思います。とにかく、解決策は簡単に思えます。ベクトル x と t と、等間隔のブレークポイント tt のセットを考えると、

t = sort((100:999)' + 3*rand(900,1));     % non-uniform time
x = 5*rand(900,1) + 10;             % x(i) is the value at time t(i)

tt = ( floor(t(1)):1*60:ceil(t(end)) )';

（上記でソートしたことに注意してください。）

これを、完全にベクトル化された 3 行のコードで行います。まず、ブレークが任意であり、間隔が不均等である可能性がある場合は、 histc を使用して、データ系列がどの間隔に収まるかを判断します。それらが均一であることを考えると、次のようにします。

int = 1 + floor((t - t(1))/60);

繰り返しになりますが、t の要素がソートされていることがわかっていなければ、t(1) の代わりに min(t) を使用していたでしょう。それが終わったら、accumarray を使用して結果を平均と標準偏差に減らします。

mu = accumarray(int,x,[],@mean);
sd = accumarray(int,x,[],@std);

score 4 · Accepted Answer

セル配列を作成して、cellfunを介してmeanとstdを適用することができます。900エントリのソリューションよりも約10％遅くなりますが、90000エントリの場合は約10倍速くなります。

[t,sortIdx]=sort(t); %# we only need to sort in case t is not monotonously increasing
x = x(sortIdx);

tIdx = floor(t/60); %# convert seconds to minutes - can also convert to 5 mins by dividing by 300
tIdx = tIdx - min(tIdx) + 1; %# tIdx now is a vector of indices - i.e. it starts at 1, and should go like your iteration variable.

%# the next few commands are to count how many 1's 2's 3's etc are in tIdx
dt = [tIdx(2:end)-tIdx(1:end-1);1]; 
stepIdx = [0;find(dt>0)];
nIdx = stepIdx(2:end) - stepIdx(1:end-1); %# number of times each index appears

%# convert to cell array
xCell = mat2cell(x,nIdx,1);

%# use cellfun to calculate the mean and sd
mu(tIdx(stepIdx+1)) = cellfun(@mean,xCell); %# the indexing is like that since there may be missing steps
sd(tIdx(stepIdx+1)) = cellfun(@mean,xCell);

注：最後にいくつかの時間値をスキップし（1:60:90は[1,61]）、間隔の開始が完全に同じではないため、私のソリューションでは完全に同じ結果が得られません。。

score 3 · Accepted Answer

これが二分探索を使用する方法です。9900要素の場合は6〜10倍、99900要素の場合は約64倍高速です。900個の要素だけを使用して信頼できる時間を取得するのは困難だったので、そのサイズでどちらが速いかはわかりません。生成されたデータから直接txを作成することを検討する場合、余分なメモリをほとんど使用しません。それ以外に、4つの追加のfloat変数（prevind、first、mid、およびlast）があります。

% Sort the data so that we can use binary search (takes O(N logN) time complexity).
tx = sortrows([t x]);

prevind = 1;

for i=1:N
    % First do a binary search to find the end of this section
    first = prevind;
    last = length(tx);
    while first ~= last
        mid = floor((first+last)/2);
        if tt(i+1) > tx(mid,1)
            first = mid+1;
        else
            last = mid;
        end;
    end;
    mu(i) = mean( tx(prevind:last-1,2) );
    sd(i) = std( tx(prevind:last-1,2) );
    prevind = last;
end;

それはあなたが最初に持っていたすべての変数を使用します。それがあなたのニーズに合っていることを願っています。二分探索でインデックスを見つけるにはO（log N）が必要ですが、O（N）はあなたが行っていた方法でインデックスを見つけるので高速です。

score 2 · Accepted Answer

免責事項：私はこれを紙で解決しましたが、「インシリコ」で確認する機会はまだありません...

トリッキーな累積合計を実行し、インデックスを作成し、平均と標準偏差を自分で計算することで、ループを回避したり、セル配列を使用したりできる場合があります。これがうまくいくと私が信じているいくつかのコードですが、それが他のソリューションとどのようにスピード的に積み重なるかはわかりません：

[t,sortIndex] = sort(t);  %# Sort the time points
x = x(sortIndex);         %# Sort the data values
interval = 60;            %# Interval size, in seconds

intervalIndex = floor((t-t(1))./interval)+1;  %# Collect t into intervals
nIntervals = max(intervalIndex);              %# The number of intervals
mu = zeros(nIntervals,1);                     %# Preallocate mu
sd = zeros(nIntervals,1);                     %# Preallocate sd

sumIndex = [find(diff(intervalIndex)) ...
            numel(intervalIndex)];  %# Find indices of the interval ends
n = diff([0 sumIndex]);             %# Number of samples per interval
xSum = cumsum(x);                   %# Cumulative sum of x
xSum = diff([0 xSum(sumIndex)]);    %# Sum per interval
xxSum = cumsum(x.^2);               %# Cumulative sum of x^2
xxSum = diff([0 xxSum(sumIndex)]);  %# Squared sum per interval

intervalIndex = intervalIndex(sumIndex);  %# Find index into mu and sd
mu(intervalIndex) = xSum./n;                             %# Compute mean
sd(intervalIndex) = sqrt((xxSum-xSum.*xSum./n)./(n-1));  %# Compute std dev

上記は、このウィキペディアのページにある式の簡略化を使用して標準偏差を計算します。

score 2 · Accepted Answer

You can compute indices all at once using bsxfun:

indices = ( bsxfun(@ge, t, tt(1:end-1)') & bsxfun(@lt, t, tt(2:end)') );

This is faster than looping but requires storing them all at once (time vs space tradeoff)..

score 0 · Accepted Answer

上記と同じ答えですが、パラメトリック間隔 ( window_size) を使用します。ベクトルの長さの問題も解決されました。

window_size = 60; % but it can be any value 60 5 0.1, which wasn't described above

t = sort((100:999)' + 3*rand(900,1));     % non-uniform time
x = 5*rand(900,1) + 10;                   % x(i) is the value at time t(i)

int = 1 + floor((t - t(1))/window_size);
tt = ( floor(t(1)):window_size:ceil(t(end)) )';



% mean val and std dev of the accelerations at speed
mu = accumarray(int,x,[],@mean);
sd = accumarray(int,x,[],@std);

%resolving some issue with sizes (for i.e. window_size = 1 in stead of 60)
while ( sum(size(tt) > size(mu)) > 0 ) 
  tt(end)=[]; 
end

errorbar(tt,mu,sd);

matlab - MATLAB: 時系列の各 1 分間隔の平均を計算する

6 に答える 6

Related

Reference