python - Python での遅い配列操作

Question

私の質問はおそらく非常に単純ですが、この操作を高速化する方法がわかりません

  print a[(b==c[i]) for i in arange(0,len(c))]

ここで、a、b、c は 3 つのnumpy配列です。私は何百万ものエントリを持つ配列を扱っており、上記のコードが私のプログラムのボトルネックです。

score 4 · Accepted Answer

awhereの値を取得しようとしていますb==cか?

もしそうなら、あなたはただ行うことができますa[b==c]：

from numpy import *

a = arange(11)
b = 11*a
c = b[::-1]

print a        # [  0   1   2   3   4   5   6   7   8   9  10]
print b        # [  0  11  22  33  44  55  66  77  88  99 110]
print c        # [110  99  88  77  66  55  44  33  22  11   0]
print a[b==c]  # [5]

score 2 · Accepted Answer

放送を検討したほうがいいかもしれません。次のようなものを探していると思いますか？

>>> b=np.arange(5)
>>> c=np.arange(6).reshape(-1,1)
>>> b
array([0, 1, 2, 3, 4])
>>> c
array([[0],
       [1],
       [2],
       [3],
       [4],
       [5]])
>>> b==c
array([[ True, False, False, False, False],
       [False,  True, False, False, False],
       [False, False,  True, False, False],
       [False, False, False,  True, False],
       [False, False, False, False,  True],
       [False, False, False, False, False]], dtype=bool)
>>> np.any(b==c,axis=1)
array([ True,  True,  True,  True,  True, False], dtype=bool)

大規模な配列の場合は、次を試すことができます。

import timeit

s="""
import numpy as np
array_size=500
a=np.random.randint(500, size=(array_size))
b=np.random.randint(500, size=(array_size))
c=np.random.randint(500, size=(array_size))
"""

ex1="""
a[np.any(b==c.reshape(-1,1),axis=0)]
"""

ex2="""
a[np.in1d(b,c)]
"""

print 'Example 1 took',timeit.timeit(ex1,setup=s,number=100),'seconds.'
print 'Example 2 took',timeit.timeit(ex2,setup=s,number=100),'seconds.'

array_size が 50 の場合:

Example 1 took 0.00323104858398 seconds.
Example 2 took 0.0125901699066 seconds.

array_size が 500 の場合:

Example 1 took 0.142632007599 seconds.
Example 2 took 0.0283041000366 seconds.

array_size が 5,000 の場合:

Example 1 took 16.2110910416 seconds.
Example 2 took 0.170011043549 seconds.

array_size が 50,000 (number=5) の場合:

Example 1 took 33.0327301025 seconds.
Example 2 took 0.0996031761169 seconds.

結果が同じになるように、 np.any() の軸を変更する必要があったことに注意してください。np.in1d の順序を逆にするか、目的の効果を得るために np.any の軸を切り替えます。例 1 から reshape を取り出すことができますが、reshape は非常に高速です。目的の効果が得られるように切り替えます。本当に興味深い - 将来これを使用する必要があります。

score 0 · Accepted Answer

どうですかnp.where():

>>> a  = np.array([2,4,8,16])
>>> b  = np.array([0,0,0,0])
>>> c  = np.array([1,0,0,1])
>>> bc = np.where(b==c)[0] #indices where b == c
>>> a[bc]
array([4,8])

これでうまくいくはずです。タイミングが目的に最適かどうかわからない

>>> a = np.random.randint(0,10000,1000000)
>>> b = np.random.randint(0,10000,1000000)
>>> c = np.random.randint(0,10000,1000000)
>>> %timeit( a[ np.where( b == c )[0] ]   )
100 loops, best of 3: 11.3 ms per loop

python - Python での遅い配列操作

3 に答える 3

Related

Reference