python - PANDAS で異なるインデックスを持つデータフレームまたはシリーズで計算を行うにはどうすればよいですか?

Question

長さとデータ型が同じ 2 つのシリーズがあります。どちらも float64 です。唯一の違いは、インデックスはどちらも日付ですが、一方の日付が月の初めで、もう一方の日付が月末であることです。異なるインデックスを持つシリーズやデータフレームで相関や共分散などの計算を行うにはどうすればよいですか?

import numpy as np
from pandas import Series, DataFrame
import pandas as pd
import Quandl

IPO=Quandl.get("RITTER/US_IPO_STATS", authtoken="api key")
ir=Quandl.get("FRBC/REALRT", authtoken="api key")

ipo_splice=IPO[264:662]
new_ipo=ipo_splice['Gross Number of IPOs'];
new_ipo=new_ipo.T


ir_splice=ir[0:398]
new_ir=ir_splice['RR 1 Month']
new_ir=new_ir.T

new_ipo.corr(new_ir)

score 0 · Accepted Answer

reset_index(drop=True)相関させたいものについては、連結します。

s1 = pd.DataFrame(np.random.rand(10), list('abcdefghij'), columns=['s1'])
s2 = pd.DataFrame(np.random.rand(10), list('ABCDEFGHIJ'), columns=['s2'])

print pd.concat([s.reset_index(drop=True) for s in [s1, s2]], axis=1).corr()


          s1        s2
s1  1.000000 -0.437945
s2 -0.437945  1.000000

score 0 · Accepted Answer

インデックスの 1 つをリサンプリングするためにresample()関数を使用できます(私たちの目標は両方のインデックス BoM または EoM を持つことです)。

データ：

In [63]: df_bom
Out[63]:
            val
2015-01-01   76
2015-02-01   27
2015-03-01   65
2015-04-01   71
2015-05-01    9
2015-06-01   23
2015-07-01   52
2015-08-01   10
2015-09-01   62
2015-10-01   25

In [64]: df_eom
Out[64]:
            val
2015-01-31   87
2015-02-28   16
2015-03-31   85
2015-04-30    4
2015-05-31   37
2015-06-30   63
2015-07-31    3
2015-08-31   73
2015-09-30   81
2015-10-31   69

解決：

In [61]: df_eom.resample('MS') + df_bom
C:\envs\py35\Scripts\ipython:1: FutureWarning: .resample() is now a deferred operation
use .resample(...).mean() instead of .resample(...)
Out[61]:
            val
2015-01-01  163
2015-02-01   43
2015-03-01  150
2015-04-01   75
2015-05-01   46
2015-06-01   86
2015-07-01   55
2015-08-01   83
2015-09-01  143
2015-10-01   94

In [62]: df_eom.resample('MS').join(df_bom, lsuffix='_lft')
C:\envs\py35\Scripts\ipython:1: FutureWarning: .resample() is now a deferred operation
use .resample(...).mean() instead of .resample(...)
Out[62]:
            val_lft  val
2015-01-01       87   76
2015-02-01       16   27
2015-03-01       85   65
2015-04-01        4   71
2015-05-01       37    9
2015-06-01       63   23
2015-07-01        3   52
2015-08-01       73   10
2015-09-01       81   62
2015-10-01       69   25

別のアプローチ- DF の byyearとmonthpart をマージする:

In [69]: %paste
(pd.merge(df_bom, df_eom,
          left_on=[df_bom.index.year, df_bom.index.month],
          right_on=[df_eom.index.year, df_eom.index.month],
          suffixes=('_bom','_eom')))
## -- End pasted text --
Out[69]:
   key_0  key_1  val_bom  val_eom
0   2015      1       76       87
1   2015      2       27       16
2   2015      3       65       85
3   2015      4       71        4
4   2015      5        9       37
5   2015      6       23       63
6   2015      7       52        3
7   2015      8       10       73
8   2015      9       62       81
9   2015     10       25       69

設定：

In [59]: df_bom = pd.DataFrame({'val':np.random.randint(0,100, 10)}, index=pd.date_range('2015-01-01', periods=10, freq='MS'))

In [60]: df_eom = pd.DataFrame({'val':np.random.randint(0,100, 10)}, index=pd.date_range('2015-01-01', periods=10, freq='M'))

python - PANDAS で異なるインデックスを持つデータフレームまたはシリーズで計算を行うにはどうすればよいですか?

2 に答える 2

Related

Reference