python - Python でパラメトリック平均をプロットする

Question

r という大きな実数 1 次元データセットがあります。プロットをお願いします:

mean(log(1+a*r)) vs a, with a > -1 .

これは私のコードです:

   rr=pd.read_csv('goog.csv')
   dd=rr['Close']
   series=pd.Series(dd)
   seriespct=series.pct_change()
   seriespct[0]=seriespct.mean()

   dum1 =[0]*len(dd)

   a=1.
   a_max = 1.
   a_step = 0.01

   a = scipy.arange(-3.+a_step, a_max, a_step)
   n = len(a)
   dum2 =[0]*n
   m=len(dd)

   for j in range(n):
      for i in range(m):
         dum1[i]=math.log(1+a[j]*seriespct[i])

   dum2[j]=scipy.mean(dum1)


   plt.plot(a,dum2)
   plt.show()

よりエレガントな方法でこれを行うにはどうすればよいですか?

score 3 · Accepted Answer

私はこれをお勧めします：

plt.plot(a, np.log(1 + r*a[:,None]).mean(1))

これには for ループが回避されるため、速度が大幅に向上します。データセットが大きい場合、numpy で実行されるループは大幅に高速になります。

In [49]: a = np.arange(a_step-.3, a_max, a_step)

In [50]: r = np.random.random(100)

In [51]: timeit [scipy.mean(log(1+a[i]*r)) for i in range(len(a))]
100 loops, best of 3: 5.47 ms per loop

In [52]: timeit np.log(1 + r*a[:,None]).mean(1)
1000 loops, best of 3: 384 µs per loop

ブロードキャストによって機能するため、ある軸と別のa軸にr沿って変化します。その後、変化する軸に沿って平均を取ることができるため、変化する(および形状がと同じ)r配列を保持できます。aa

import numpy as np
import matplotlib.pyplot as plt

r = np.random.random(100)

a = 1.
a_max = 1.
a_step = 0.01
a = np.arange(a_step-.3, a_max, a_step)
a.shape
#(129,)
a = a[:,None] #adds a new axis, making this a column vector, same as: a = a.reshape(-1,1)
a.shape
#(129, 1)
(a*r).shape
#(129, 100)
loga = np.log(1 + a*r)
loga.shape
#(129,100)
mloga = loga.mean(axis=1) #take the mean along the 2nd axis where `a` varies
mloga.shape
#(129,)

plt.plot(a, mloga)
plt.show()

補遺:

ブロードキャストへの依存を避けるために、次を使用できますnp.outer。

plt.plot(a, np.log(1 + np.outer(a,r)).mean(1))

再形成する必要はありませんa（ステップをスキップしてくださいa = a[:,None]）

何が起こっているかを見ることができるように、以下に簡単な例を示します。

r = np.exp(np.arange(1,5))
a = np.arange(5)

In [33]: r
Out[33]: array([  2.71828183,   7.3890561 ,  20.08553692,  54.59815003])

In [34]: a
Out[34]: array([0, 1, 2, 3, 4])

In [39]: r*a[:,None]
Out[39]: 
# this is  2.7...         7.3...        20.08...       54.5...         # times:
array([[   0.        ,    0.        ,    0.        ,    0.        ],   # 0
       [   2.71828183,    7.3890561 ,   20.08553692,   54.59815003],   # 1
       [   5.43656366,   14.7781122 ,   40.17107385,  109.19630007],   # 2
       [   8.15484549,   22.1671683 ,   60.25661077,  163.7944501 ],   # 3
       [  10.87312731,   29.5562244 ,   80.34214769,  218.39260013]])  # 4

In [40]: np.outer(a,r)
Out[40]: 
array([[   0.        ,    0.        ,    0.        ,    0.        ],
       [   2.71828183,    7.3890561 ,   20.08553692,   54.59815003],
       [   5.43656366,   14.7781122 ,   40.17107385,  109.19630007],
       [   8.15484549,   22.1671683 ,   60.25661077,  163.7944501 ],
       [  10.87312731,   29.5562244 ,   80.34214769,  218.39260013]])

# this is the mean of each column:
In [41]: (np.outer(a,r)).mean(1)
Out[41]: array([  0.        ,  21.19775622,  42.39551244,  63.59326866,  84.79102488])

# and the log of 1 + the above is:
In [42]: np.log(1+(np.outer(a,r)).mean(1))
Out[42]: array([ 0.        ,  3.09999121,  3.77035604,  4.16811021,  4.4519144 ])

python - Python でパラメトリック平均をプロットする

2 に答える 2

補遺:

Related

Reference