python - Python と Pandas - 「方向を数え」、「今までの平均」を表示する列

Question

特定の分の終わりに (株式の) 価格を含む DataFrame があります。

DF 列は次のとおりです。

minutes_id: 0 ～ 1440、午前 0 時は 0、午前 8 時は 480 (60*8)
price:分の終わりの株価
変更: 前からの価格変更。分
方向: 変更の方向

import numpy.random as nprnd
from pandas import DataFrame

n = 10    # Number of samples
# Starting at 8:00 AM, set some (n) random prices between 4-5
df = DataFrame({'minute_id': range(480,480+n), 'price':(5-4) * nprnd.random(n) + 4 })
df['change'] = df.price - df.price.shift(1)
df['direction'] = df.change.map(lambda x: 0 if x == 0 else x/abs(x))
df = df.dropna()
df

この DF に列をいくつか追加したいと思います。

1行目は今までの平均価格、価格になります。2 番目の行では、最初の 2 行の平均価格が n 番目の行で、最初の n 行の平均価格になります。
現在の方向にある間の「変更」列の合計 (「方向」が切り替わるたびにゼロになります)
今までの現在の方向の数すべての行について、現在の方向の実行でのこの行の数はいくつですか。
最後の 4 行の平均価格

一度に DF 行を反復処理することで、これらすべての列を作成できます。しかし、それを行うためのより多くの（pythonic | pandastic）方法があると確信しています。

欠落しているデータの処理方法もわかりません (minute_id 内にギャップがある場合)

編集：

追加したかった4つの列のうち、1と4は簡単です...

C4: これは周期 4 のローリング平均です。

C1: ローリング平均は、最小期間の別のパラメーターを取得できます。

これを 1 に設定し、ウィンドウサイズを df の長さに設定すると、セット内のすべての行の移動平均が得られます。

df['rolling_avg'] = pd.rolling_mean(df.price, n, 1)

残りの 2 つのコラムについては、まだ最善の方法を見つけようとしています。

score 4 · Accepted Answer

OK、たくさん「遊んで」から、自分に合ったものを手に入れました。

もう少し「Pandastic」な方法で行うこともできますが、これは合理的な方法です。

「10 minutes to pandas」を指摘してくれたAndy Hayden、Jeff、Phillip Cloudに感謝します。直接的な回答は含まれていませんでしたが、非常に役に立ちました。また、Andy Haydenがローリング・ミーンを作成するために私を送ってくれました。

それでは、列ごとにやってみましょう

列 1 の追加: 今までの平均価格

# Rolling avg, windows size is the size of the entire DataFrame, with minimum of 1
df['rolling_avg'] = pd.rolling_mean(df.price, n, 1)

列 4 の追加: 最後の 4 行の平均価格

df['RA_wnd_4'] = pd.rolling_mean(df.price, 4, 1)

列 2 の追加: 現在の「blcok」(方向) にいる間に「変更」列の CumSum()

# Adding Helper column that shows when direction have been changed 
df['dir_change'] = (df.direction.shift(1) != df.direction).astype(int)
# Identify the DF "blocks" for every direction change 
df['block'] = df.dir_change.cumsum()
# Split the DF based on those bolcks 
grouped = df.groupby('block')
# Add Function that will cumsum() for a block, and call it 
def f1(group):
     return DataFrame({'rolling_count' : group.cumsum()}) 

df['rolling_count'] = grouped.change.apply(f1)

列 3 の追加: 現在の「ブロック」の行番号 (方向)

df['one'] = 1
df['rolling_count'] = grouped.one.apply(f1)
df = df.drop('one', axis=1)

完全なコード:

import numpy.random as nprnd
from pandas import DataFrame
import pandas as pd

n = 10 # Number of samples
# Starting at 8:00 AM, set some (n) random prices between 4-5
df = DataFrame({'minute_id': range(480,480+n), 'price':(5-4) * nprnd.random(n) + 4 })
df['change'] = df.price - df.price.shift(1)
df['direction'] = df.change.map(lambda x: 0 if x == 0 else x/abs(x))
df = df.dropna()
#------------------------------------------
# Col 1, rolling Avg over the entire DF
df['rolling_avg'] = pd.rolling_mean(df.price, n, 1) 
#------------------------------------------
# Col 4, rolling Avg windows size of 4
df['RA_wnd_4'] = pd.rolling_mean(df.price, 4, 1)
#------------------------------------------
# Helper code for cols 2, 3 
# Adding Helper column that shows when direction have been changed
df['dir_change'] = (df.direction.shift(1) != df.direction).astype(int)
# Identify the DF "blocks" for every direction change
df['block'] = df.dir_change.cumsum()
# Split the DF based on those bolcks
grouped = df.groupby('block')
# Add Function that will cumsum() for a block, and call it
def f1(group):
     return DataFrame({'rolling_count' : group.cumsum()})
df['one'] = 1
#------------------------------------------
# Col 2, CumSum() of the 'change' column while in the current "blcok" (direction)
df['rolling_count'] = grouped.change.apply(f1)
#------------------------------------------
# Col 3, Count in the current "block" (Direction)
df['rolling_count'] = grouped.one.apply(f1)
df = df.drop('one', axis=1)

print df

出力：

 minute_id  price   change  direction   rolling_avg     RA_wnd_4    dir_change  block   rolling_count
1   481     4.771701    0.474349    1   4.771701    4.771701    1   1   1
2   482     4.300078    -0.471623   -1  4.535889    4.535889    1   2   1
3   483     4.946744    0.646666    1   4.672841    4.672841    1   3   1
4   484     4.529403    -0.417340   -1  4.636981    4.636981    1   4   1
5   485     4.434598    -0.094805   -1  4.596505    4.552706    0   4   2
6   486     4.171169    -0.263429   -1  4.525616    4.520479    0   4   3
7   487     4.416980    0.245810    1   4.510096    4.388038    1   5   1
8   488     4.727078    0.310098    1   4.537219    4.437456    0   5   2
9   489     4.049097    -0.677981   -1  4.482983    4.341081    1   6   1

python - Python と Pandas - 「方向を数え」、「今までの平均」を表示する列

1 に答える 1

Related

Reference