python - Pandas: df.groupby(x, y).apply() が複数の列のパラメーターエラーにまたがる

Question

基本的な問題:

「行ごと」の単純なパーセント変更を実行したい「過去」と「現在」の変数がいくつかあります。例: ((exports_now - exports_past)/exports_past)).

これら 2 つの質問はこれを達成しますが、同様の方法を試すと、関数 deltas gets an unknown parameter というエラーが表示されますaxis。

データ例：

exports_ past    exports_ now    imports_ past    imports_ now    ect.(6 other pairs)
   .23               .45             .43             .22              1.23
   .13               .21             .47             .32               .23
    0                 0              .41             .42               .93
   .23               .66             .43             .22               .21
    0                .12             .47             .21              1.23

最初の質問の答えに続いて、

私の解決策は、次のような関数を使用することです。

def deltas(row):
    '''
    simple pct change
    '''
    if int(row[0]) == 0 and int(row[1]) == 0:
        return 0
    elif int(row[0]) == 0:
        return np.nan
    else:
        return ((row[1] - row[0])/row[0])

そして、次のように関数を適用します。

df['exports_delta'] = df.groupby(['exports_past', 'exports_now']).apply(deltas, axis=1)

これにより、次のエラーが生成されます:TypeError: deltas() got an unexpected keyword argument 'axis' 軸パラメーターエラーを回避する方法についてのアイデアはありますか? または、pct の変化を計算するよりエレガントな方法はありますか? 私の問題のキッカーは、この関数をいくつかの異なる列ペアに適用できる必要があるため、2番目の質問の回答のように列名をハードコーディングすることは望ましくありません。ありがとう！

score 5 · Accepted Answer

pct_changeこれを行うには、Series/DataFrame メソッドの使用を検討してください。

df.pct_change()

apply混乱は、Series/DataFrame と groupbyの 2 つの異なる (ただし同じ名前の)関数に起因します。

In [11]: df
Out[11]:
   0  1  2
0  1  1  1
1  2  2  2

DataFrameのapplyメソッドは、軸の引数を取ります。

In [12]: df.apply(lambda x: x[0] + x[1], axis=0)
Out[12]:
0    3
1    3
2    3
dtype: int64

In [13]: df.apply(lambda x: x[0] + x[1], axis=1)
Out[13]:
0    2
1    4
dtype: int64

groupby applyは適用されず、kwarg が関数に渡されます。

In [14]: g.apply(lambda x: x[0] + x[1])
Out[14]:
0    2
1    4
dtype: int64

In [15]: g.apply(lambda x: x[0] + x[1], axis=1)
TypeError: <lambda>() got an unexpected keyword argument 'axis'

注: groupbyにはaxis 引数があるため、本当に必要な場合はそこで使用できます。

In [16]: g1 = df.groupby(0, axis=1)

In [17]: g1.apply(lambda x: x.iloc[0, 0] + x.iloc[1, 0])
Out[17]:
0
1    3
2    3
dtype: int64

python - Pandas: df.groupby(x, y).apply() が複数の列のパラメーター エラーにまたがる

基本的な問題:

データ例：

私の解決策は、次のような関数を使用することです。

そして、次のように関数を適用します。

1 に答える 1

Related

Reference

python - Pandas: df.groupby(x, y).apply() が複数の列のパラメーターエラーにまたがる