python - 異質な時系列をDataFrameに追加する

Question

目的

パンダを使用して分析したいCSV形式の複数の商品の金融取引データがあります。トレードは不規則な間隔で発生し、1秒の精度でタイムスタンプが付けられます。その結果、一部のトレードは「同時に」、つまり同じタイムスタンプで発生します。

現時点での目的は、各商品の累積取引量のプロットを作成することです。

現在の進行

取引データは、解析された日時のインデックスであるread_csv（）を使用してDataFrameに読み込まれています。

<class 'pandas.core.frame.DataFrame'>
DatetimeIndex: 447 entries, 2012-12-07 17:16:46 to 2012-12-10 16:28:29
Data columns:
Account Name    447  non-null values
Exchange        447  non-null values
Instrument      447  non-null values
Fill ID         447  non-null values
Side            447  non-null values
Quantity        447  non-null values
Price           447  non-null values
dtypes: float64(1), int64(1), object(5)

「QuantitySigned」列を追加するために少し作業が行われます。

楽器ごとにデータにアクセスできるように「グループ化」を行いました。

grouped = trades.groupby('Instrument', sort=True)
for name, group in grouped:
        group.QuantitySigned.cumsum().plot(label=name)
plt.legend()

質問

上記は機能しますが、DataFrame.plot（）を使用できるように、1つのDataFrameにTimeSeries（機器ごとに1つ）、つまり機器ごとに列を配置したいと思います。問題は、2つのTimeSeriesがまったく同じインデックスを持っていないことです。つまり、すべてのTimeSeriesのインデックスをマージする必要があります。

以下の簡単な例を考えると、これは機能するはずです。

index=pd.date_range('2012-12-21', periods=5)
s1 = Series(randn(3), index=index[:3])
s2 = Series(randn(3), index=index[2:])
df = DataFrame(index=index)
df['s1'] = s1
df['s2'] = s2

ただし、TimeSeriesをDataFrameに集約しようとすると例外がスローされ、重複するインデックス要素に関係していると思います。

grouped = trades.groupby('Instrument', sort=True)
df = DataFrame(index=trades.index)
for name, group in grouped:
        df[name] = group.QuantitySigned.cumsum()
df.plot()

Exception: Reindexing only valid with uniquely valued Index objects

私はこれを「正しく」行っていますか？これをより良い方法で行う方法について何か提案はありますか？

実行可能な例

例外をスローする実行可能な例を次に示します。

import pandas as pd
from pandas import Series
from pandas import DataFrame

index = pd.tseries.index.DatetimeIndex(['2012-12-22', '2012-12-23', '2012-12-23'])

s1 = Series(randn(2), index[:2]) # No duplicate index elements
df1 = DataFrame(s1, index=index) # This works

s2 = Series(randn(2), index[-2:]) # Duplicate index elements
df2 = DataFrame(s2, index=index) # This throws

ソリューション

解決策を提供してくれた@crewbumに感謝します。

grouped = trades.groupby('Instrument', sort=True)
dflist = list()
for name, group in grouped:
    dflist.append(DataFrame({name : group.QuantitySigned.cumsum()}))
results = pd.concat(dflist)
results = results.sort().ffill().fillna(0)
results.plot()

注：最初に塗りつぶしを転送してから、残りのNaNをゼロに設定します。@crewbumが指摘したように、ffill（）とbfill（）は0.10.0の新機能です。

私が使用しているもの：

パンダ0.10.0
numpy 1.6.1
Python2.7.3。

score 3 · Accepted Answer

pd.concat（）は、デフォルトでインデックスに対して「外部」結合を実行し、時間の前後にパディングすることで穴を埋めることができます。

In [17]: pd.concat([DataFrame({'s1': s1}), DataFrame({'s2': s2})]).ffill().bfill()
Out[17]: 
                 s1   s2
2012-12-21  9.0e-01 -0.3
2012-12-22  5.0e-03 -0.3
2012-12-23 -2.9e-01 -0.3
2012-12-23 -2.9e-01 -0.3
2012-12-24 -2.9e-01 -1.8
2012-12-25 -2.9e-01 -1.4

私はそれを追加する必要がffill()ありbfill()、パンダ0.10.0の新機能です。fillna(method='ffill')その前に、とを使用できますfillna(method='bfill')。

python - 異質な時系列をDataFrameに追加する

目的

現在の進行

質問

実行可能な例

ソリューション

1 に答える 1

Related

Reference