OK、たくさん「遊んで」から、自分に合ったものを手に入れました。
もう少し「Pandastic」な方法で行うこともできますが、これは合理的な方法です。
「10 minutes to pandas」を指摘してくれたAndy Hayden、Jeff、Phillip Cloudに感謝します。直接的な回答は含まれていませんでしたが、非常に役に立ちました。また、Andy Haydenがローリング・ミーンを作成するために私を送ってくれました。
それでは、列ごとにやってみましょう
列 1 の追加: 今までの平均価格
# Rolling avg, windows size is the size of the entire DataFrame, with minimum of 1
df['rolling_avg'] = pd.rolling_mean(df.price, n, 1)
列 4 の追加: 最後の 4 行の平均価格
df['RA_wnd_4'] = pd.rolling_mean(df.price, 4, 1)
列 2 の追加: 現在の「blcok」(方向) にいる間に「変更」列の CumSum()
# Adding Helper column that shows when direction have been changed
df['dir_change'] = (df.direction.shift(1) != df.direction).astype(int)
# Identify the DF "blocks" for every direction change
df['block'] = df.dir_change.cumsum()
# Split the DF based on those bolcks
grouped = df.groupby('block')
# Add Function that will cumsum() for a block, and call it
def f1(group):
return DataFrame({'rolling_count' : group.cumsum()})
df['rolling_count'] = grouped.change.apply(f1)
列 3 の追加: 現在の「ブロック」の行番号 (方向)
df['one'] = 1
df['rolling_count'] = grouped.one.apply(f1)
df = df.drop('one', axis=1)
完全なコード:
import numpy.random as nprnd
from pandas import DataFrame
import pandas as pd
n = 10 # Number of samples
# Starting at 8:00 AM, set some (n) random prices between 4-5
df = DataFrame({'minute_id': range(480,480+n), 'price':(5-4) * nprnd.random(n) + 4 })
df['change'] = df.price - df.price.shift(1)
df['direction'] = df.change.map(lambda x: 0 if x == 0 else x/abs(x))
df = df.dropna()
#------------------------------------------
# Col 1, rolling Avg over the entire DF
df['rolling_avg'] = pd.rolling_mean(df.price, n, 1)
#------------------------------------------
# Col 4, rolling Avg windows size of 4
df['RA_wnd_4'] = pd.rolling_mean(df.price, 4, 1)
#------------------------------------------
# Helper code for cols 2, 3
# Adding Helper column that shows when direction have been changed
df['dir_change'] = (df.direction.shift(1) != df.direction).astype(int)
# Identify the DF "blocks" for every direction change
df['block'] = df.dir_change.cumsum()
# Split the DF based on those bolcks
grouped = df.groupby('block')
# Add Function that will cumsum() for a block, and call it
def f1(group):
return DataFrame({'rolling_count' : group.cumsum()})
df['one'] = 1
#------------------------------------------
# Col 2, CumSum() of the 'change' column while in the current "blcok" (direction)
df['rolling_count'] = grouped.change.apply(f1)
#------------------------------------------
# Col 3, Count in the current "block" (Direction)
df['rolling_count'] = grouped.one.apply(f1)
df = df.drop('one', axis=1)
print df
出力:
minute_id price change direction rolling_avg RA_wnd_4 dir_change block rolling_count
1 481 4.771701 0.474349 1 4.771701 4.771701 1 1 1
2 482 4.300078 -0.471623 -1 4.535889 4.535889 1 2 1
3 483 4.946744 0.646666 1 4.672841 4.672841 1 3 1
4 484 4.529403 -0.417340 -1 4.636981 4.636981 1 4 1
5 485 4.434598 -0.094805 -1 4.596505 4.552706 0 4 2
6 486 4.171169 -0.263429 -1 4.525616 4.520479 0 4 3
7 487 4.416980 0.245810 1 4.510096 4.388038 1 5 1
8 488 4.727078 0.310098 1 4.537219 4.437456 0 5 2
9 489 4.049097 -0.677981 -1 4.482983 4.341081 1 6 1