python - データポイントが時間どおりに一致しない時系列グラフから値を合計/スタックするアルゴリズム

Question

グラフ作成/分析の問題があり、頭を悩ませることができません。私はブルートフォースを行うことができますが、それは遅すぎます。誰かがより良いアイデアを持っているか、Python用のスピーディーなライブラリを知っているのでしょうか。

集計（およびその後プロット）したい2つ以上の時系列データセット（x、y）があります。問題は、シリーズ全体のx値が一致しないことです。私は、値を時間ビンに複製することに頼りたくありません。

したがって、これらの2つのシリーズを考えると：

S1: (1;100) (5;100) (10;100)
S2: (4;150) (5;100) (18;150)

一緒に追加すると、次のようになります。

ST: (1;100) (4;250) (5;200) (10;200) (18;250)

論理：

x=1 s1=100, s2=None, sum=100
x=4 s1=100, s2=150, sum=250 (note s1 value from previous value)
x=5 s1=100, s2=100, sum=200
x=10 s1=100, s2=100, sum=200
x=18 s1=100, s2=150, sum=250

私の現在の考えは、keys（x）のソートされたリストを繰り返し、各シリーズの以前の値を保持し、xに新しいyがあるかどうか各セットにクエリを実行することです。

任意のアイデアをいただければ幸いです！

score 1 · Accepted Answer

このようなもの：

def join_series(s1, s2):
    S1 = iter(s1)
    S2 = iter(s2)
    value1 = 0
    value2 = 0
    time1, next1 = next(S1)
    time2, next2 = next(S2)
    end1 = False
    end2 = False

    while True:    
        time = min(time1, time2)
        if time == time1:
            value1 = next1
            try:
                time1, next1 = next(S1)
            except StopIteration:
                end1 = True
                time1 = time2

        if time == time2:
            value2 = next2
            try:
                time2, next2 = next(S2)
            except StopIteration:
                end2 = True
                time2 = time1

        yield time, value1 + value2

        if end1 and end2:
            raise StopIteration

S1 = ((1, 100), (5, 100), (10, 100))
S2 = ((4, 150), (5, 100), (18, 150))

for result in join_series(S1, S2):
    print(result)

基本的に、S1とS2の現在の値を、S1とS2の次の値とともに保持し、「次の時間」が最も短いものに基づいてステップスルーします。さまざまな長さのリストを処理する必要があり、イテレータを使用するため、大量のデータシリーズなどを処理できる必要があります。

score 1 · Accepted Answer

考えられるアプローチの1つ：

すべてのシリーズの要素をタプル（x、y、シリーズID）、たとえば（4、150、1）にフォーマットし、それらをタプルリストに追加して、xの昇順で並べ替えます。
各シリーズの「最後に見た」値を維持するために、シリーズの数に等しい長さのリストを宣言します。
手順（1）でリストの各要素タプルを反復処理し、次のことを行います。

3.1タプルのシリーズIDに従って「最後に見た」リストを更新します

3.2以前に繰り返されたタプルのxが現在のタプルのxと一致しない場合、「最後に見た」リストのすべての要素を合計し、結果を最終リストに追加します。

今私の汚いテストで：

>>> 
S1 = ((1, 100), (5, 100), (10, 100))
S2 = ((4, 150), (5, 100), (18, 150))
>>> all = []
>>> for s in S1: all.append((s[0], s[1], 0))
...
>>> for s in S2: all.appned((s[0], s[1], 1))
...
>>> all
[(1, 100, 0), (5, 100, 0), (10, 100, 0), (4, 150, 1), (5, 100, 1), (18, 150, 1)]
>>> all.sort()
>>> all
[(1, 100, 0), (4, 150, 1), (5, 100, 0), (5, 100, 1), (10, 100, 0), (18, 150, 1)]
>>> last_val = [0]*2
>>> last_x = all[0][0]
>>> final = []
>>> for e in all:
...     if e[0] != last_x:
...             final.append((last_x, sum(last_val)))
...     last_val[e[2]] = e[1]
...     last_x = e[0]
...
>>> final.append((last_x, sum(last_val)))
>>> final
[(1, 100), (4, 250), (5, 200), (10, 200), (18, 250)]
>>>

score 1 · Accepted Answer

これを行う別の方法は、個々のデータストリームにより多くの動作を適用することです。

class DataStream(object):
    def __init__(self, iterable):
        self.iterable = iter(iterable)
        self.next_item = (None, 0)
        self.next_x = None
        self.current_y = 0
        self.next()

    def next(self):
        if self.next_item is None:
            raise StopIteration()
        self.current_y = self.next_item[1]
        try:
            self.next_item = self.iterable.next()
            self.next_x = self.next_item[0]
        except StopIteration:
            self.next_item = None
            self.next_x = None
        return self.next_item

    def __iter__(self):
        return self


class MergedDataStream(object):
    def __init__(self, *iterables):
        self.streams = [DataStream(i) for i in iterables]
        self.outseq = []

    def next(self):
        xs = [stream.next_x for stream in self.streams if stream.next_x is not None]
        if not xs:
            raise StopIteration()
        next_x = min(xs)
        current_y = 0
        for stream in self.streams:
            if stream.next_x == next_x:
                stream.next()
            current_y += stream.current_y
        self.outseq.append((next_x, current_y))
        return self.outseq[-1]

    def __iter__(self):
        return self


if __name__ == '__main__':
    seqs = [
        [(1, 100), (5, 100), (10, 100)],
        [(4, 150), (5, 100), (18, 150)],
        ]

    sm = MergedDataStream(*seqs)
    for x, y in sm:
        print "%02s: %s" % (x, y)

    print sm.outseq

python - データポイントが時間どおりに一致しない時系列グラフから値を合計/スタックするアルゴリズム

3 に答える 3

Related

Reference