python - 構造化配列で numpy.searchsorted のパフォーマンスが低下する

Question

_{用語の誤用がある場合は事前に申し訳ありませんが、お気軽に修正してください。}

でソートされた配列がありdtype '<f16, |S30'ます。最初のフィールドで使用するsearchsortedと、動作が非常に遅くなります (300 万アイテムで約 0.4 秒)。bisectこれは、単純な Python のタプルリストで同じことを行うよりもはるかに時間がかかります。

%timeit a['f0'].searchsorted(400.)
1 loops, best of 3: 398 ms per loop

ただし、フロート部分を別の別の配列にコピーすると、検索は次の場合よりも高速になりbisectます。

b = a['f0'].copy()

%timeit b.searchsorted(400.)
1000000 loops, best of 3: 945 ns per loop

私の質問は次のとおりです。

私は何か間違ったことをしていますか、それとも NumPy の回帰ですか?
データを複製せずにこれを回避する方法はありますか?

score 13 · Accepted Answer

これは少し前に見た覚えがあります。私の記憶が正しければ、データが連続していない場合、searchsorted はデータの一時コピーを作成すると思います。後で時間があれば、コードを見て、それが起こっていることを確認します (または、コードに詳しい人がこれを確認できるかもしれません)。

それまでの間、構造化配列の使用を避けるためにコードを再構築したくない場合は、おそらくを使用することをお勧めしますbisect_left(a['f0'], 400.)。私のマシンでは、連続した配列で searchsorted した場合よりも 8 倍遅くなりますが、連続していない配列で searchsorted した場合よりも 1000 倍高速です。

In [5]: a = np.arange((6e6)).view([('f0', float), ('f1', float)])

In [6]: timeit a['f0'].searchsorted(400.)
10 loops, best of 3: 51.1 ms per loop

In [7]: timeit a['f0'].copy()
10 loops, best of 3: 51 ms per loop

In [8]: timeit bisect_left(a['f0'], 400.)
10000 loops, best of 3: 52.8 us per loop

In [9]: f0 = a['f0'].copy()

In [10]: timeit f0.searchsorted(400.)
100000 loops, best of 3: 7.85 us per loop

score 3 · Accepted Answer

問題のサイズ (2015 年 5 月 11 日現在) とそれを「修正」する方法を示すコードを次に示します。

import numpy as np
import bisect
import timeit
from random import randint

dtype = np.dtype([ ('pos','<I'),('sig','<H') ])             # my data is unsigned 32bit, and unsigned 16bit
data1 = np.fromfile('./all2/840d.0a9b45e8c5344abf6ac761017e93b5bb.2.1bp.binary', dtype)

dtype2 = np.dtype([('pos',np.uint32),('sig',np.uint32)])    # convert data to both unsigned 32bit
data2 = data1.astype(dtype2)

data3 = data2.view(('uint32', len(data2.dtype.names)))    # convert above to a normal array (not structured array)

print data1.dtype.descr # [('pos', '<u4'), ('sig', '<u2')]
print data2.dtype.descr # [('pos', '<u4'), ('sig', '<u4')]
print data3.dtype.descr # [('', '<u4')]

print data1.nbytes  # 50344494
print data2.nbytes  # 67125992
print data3.nbytes  # 67125992

print data1['pos'].max() # 2099257376
print data2['pos'].max() # 2099257376
print data3[:,0].max()   # 2099257376

def b1():   return bisect.bisect_left(data1['pos'],           randint(100000000,200000000))
def b2():   return bisect.bisect_left(data2['pos'],           randint(100000000,200000000))
def b3():   return bisect.bisect_left(data3[:,0],             randint(100000000,200000000))
def ss1():  return np.searchsorted(data1['pos'],              randint(100000000,200000000))
def ss2():  return np.searchsorted(data2['pos'],              randint(100000000,200000000))
def ss3():  return np.searchsorted(data3[:,0],                randint(100000000,200000000))

def ricob1():   return bisect.bisect_left(data1['pos'], np.uint32(randint(100000000,200000000)))
def ricob2():   return bisect.bisect_left(data2['pos'], np.uint32(randint(100000000,200000000)))
def ricob3():   return bisect.bisect_left(data3[:,0],   np.uint32(randint(100000000,200000000)))
def ricoss1():  return np.searchsorted(data1['pos'],    np.uint32(randint(100000000,200000000)))
def ricoss2():  return np.searchsorted(data2['pos'],    np.uint32(randint(100000000,200000000)))
def ricoss3():  return np.searchsorted(data3[:,0],      np.uint32(randint(100000000,200000000)))

print timeit.timeit(b1,number=300)  # 0.0085117816925
print timeit.timeit(b2,number=300)  # 0.00826191902161
print timeit.timeit(b3,number=300)  # 0.00828003883362
print timeit.timeit(ss1,number=300) # 6.57477498055
print timeit.timeit(ss2,number=300) # 5.95308589935
print timeit.timeit(ss3,number=300) # 5.92483091354

print timeit.timeit(ricob1,number=300)  # 0.00120902061462
print timeit.timeit(ricob2,number=300)  # 0.00120401382446
print timeit.timeit(ricob3,number=300)  # 0.00120711326599
print timeit.timeit(ricoss1,number=300) # 4.39265394211
print timeit.timeit(ricoss2,number=300) # 0.00116586685181
print timeit.timeit(ricoss3,number=300) # 0.00108909606934

アップデート！ Rico さんのコメントのおかげで、検索したい番号の type を sorted/bisect に設定するのは本当にインポートのようです! ただし、32 ビットおよび 16 ビットの int を持つ構造化配列では、まだ遅いです (ただし、以前ほど遅くはありません)。

python - 構造化配列で numpy.searchsorted のパフォーマンスが低下する

2 に答える 2

Related

Reference