python - fast python numpy どこで機能しますか?

Question

複数のループ内で numpy の where 関数を何度も使用してforいますが、非常に遅くなります。この機能をより速く実行する方法はありますか? インライン for ループを実行し、ループの前に関数のローカル変数を作成する必要があることを読みましたが、for速度が大幅に向上するものはありません (< 1%)。len(UNIQ_IDS)~ 800.emiss_dataとobj_dataは、形状 = (2600,5200) の numpy ndarrays です。私はimport profileボトルネックがどこにあるかを把握していましたが、whereループ内forは大きな問題です。

import numpy as np
max = np.max
where = np.where
MAX_EMISS = [max(emiss_data[where(obj_data == i)]) for i in UNIQ_IDS)]

score 10 · Accepted Answer

この場合、純粋な Python ループは、NumPy のインデックス作成 (または np.where の呼び出し) よりもはるかに高速であることがわかります。

次の代替案を検討してください。

import numpy as np
import collections
import itertools as IT

shape = (2600,5200)
# shape = (26,52)
emiss_data = np.random.random(shape)
obj_data = np.random.random_integers(1, 800, size=shape)
UNIQ_IDS = np.unique(obj_data)

def using_where():
    max = np.max
    where = np.where
    MAX_EMISS = [max(emiss_data[where(obj_data == i)]) for i in UNIQ_IDS]
    return MAX_EMISS

def using_index():
    max = np.max
    MAX_EMISS = [max(emiss_data[obj_data == i]) for i in UNIQ_IDS]
    return MAX_EMISS

def using_max():
    MAX_EMISS = [(emiss_data[obj_data == i]).max() for i in UNIQ_IDS]
    return MAX_EMISS

def using_loop():
    result = collections.defaultdict(list)
    for val, idx in IT.izip(emiss_data.ravel(), obj_data.ravel()):
        result[idx].append(val)
    return [max(result[idx]) for idx in UNIQ_IDS]

def using_sort():
    uind = np.digitize(obj_data.ravel(), UNIQ_IDS) - 1
    vals = uind.argsort()
    count = np.bincount(uind)
    start = 0
    end = 0
    out = np.empty(count.shape[0])
    for ind, x in np.ndenumerate(count):
        end += x
        out[ind] = np.max(np.take(emiss_data, vals[start:end]))
        start += x
    return out

def using_split():
    uind = np.digitize(obj_data.ravel(), UNIQ_IDS) - 1
    vals = uind.argsort()
    count = np.bincount(uind)
    return [np.take(emiss_data, item).max()
            for item in np.split(vals, count.cumsum())[:-1]]

for func in (using_index, using_max, using_loop, using_sort, using_split):
    assert using_where() == func()

ベンチマークは次のshape = (2600,5200)とおりです。

In [57]: %timeit using_loop()
1 loops, best of 3: 9.15 s per loop

In [90]: %timeit using_sort()
1 loops, best of 3: 9.33 s per loop

In [91]: %timeit using_split()
1 loops, best of 3: 9.33 s per loop

In [61]: %timeit using_index()
1 loops, best of 3: 63.2 s per loop

In [62]: %timeit using_max()
1 loops, best of 3: 64.4 s per loop

In [58]: %timeit using_where()
1 loops, best of 3: 112 s per loop

したがってusing_loop、(純粋な Python) はよりも 11 倍以上高速であることがわかりますusing_where。

ここでは、純粋な Python が NumPy よりも高速である理由が完全にはわかりません。私の推測では、純粋な Python バージョンは、両方の配列を 1 回圧縮します (そうです、しゃれが意図されています)。これは、派手なインデックス作成にもかかわらず、実際には各値に 1 回アクセスするだけでよいという事実を利用しています。したがって、各値がどのグループに属するかを正確に判断する必要があるという問題をemiss_data回避します。しかし、これは漠然とした憶測にすぎません。ベンチマークするまで、高速になるとは知りませんでした。

score 5 · Accepted Answer

これを達成するための最速の方法は、パッケージgroupby()内の操作を使用することだと思います。pandas@Ophion のusing_sort()関数と比較すると、Pandas は約 10 倍高速です。

import numpy as np
import pandas as pd

shape = (2600,5200)
emiss_data = np.random.random(shape)
obj_data = np.random.random_integers(1, 800, size=shape)
UNIQ_IDS = np.unique(obj_data)

def using_sort():
    #UNIQ_IDS,uind=np.unique(obj_data, return_inverse=True)
    uind= np.digitize(obj_data.ravel(), UNIQ_IDS) - 1
    vals=uind.argsort()
    count=np.bincount(uind)

    start=0
    end=0

    out=np.empty(count.shape[0])
    for ind,x in np.ndenumerate(count):
        end+=x
        out[ind]=np.max(np.take(emiss_data,vals[start:end]))
        start+=x
    return out

def using_pandas():
    return pd.Series(emiss_data.ravel()).groupby(obj_data.ravel()).max()

print('same results:', np.allclose(using_pandas(), using_sort()))
# same results: True

%timeit using_sort()
# 1 loops, best of 3: 3.39 s per loop

%timeit using_pandas()
# 1 loops, best of 3: 397 ms per loop

score 3 · Accepted Answer

できませんか？

emiss_data[obj_data == i]

? なぜあなたが使っているのかまったくわかりませんwhere。

score 0 · Accepted Answer

Are tuples more effective than lists in Python?によると、タプルの割り当てはリストの割り当てよりもはるかに高速です。、おそらくリストの代わりにタプルを構築するだけで、これにより効率が向上します。

score 0 · Accepted Answer

比較的小さな整数で構成されている場合obj_dataは、次を使用できますnumpy.maximum.at(v1.8.0 以降):

def using_maximumat():
    n = np.max(UNIQ_IDS) + 1
    temp = np.full(n, -np.inf)
    np.maximum.at(temp, obj_data, emiss_data)
    return temp[UNIQ_IDS]

python - fast python numpy どこで機能しますか?

6 に答える 6

Related

Reference