python - Numpyで遷移行列の作成を高速化するにはどうすればよいですか？

Question

以下は、マルコフ連鎖の遷移をカウントし、それを使用して遷移行列を作成するために私が知っている最も基本的な方法です。

def increment_counts_in_matrix_from_chain(markov_chain, transition_counts_matrix):
    for i in xrange(1, len(markov_chain)):
        old_state = markov_chain[i - 1]
        new_state = markov_chain[i]
        transition_counts_matrix[old_state, new_state] += 1

私は3つの異なる方法でそれをスピードアップしようとしました：

1）このMatlabコードに基づくスパース行列ワンライナーの使用：

transition_matrix = full(sparse(markov_chain(1:end-1), markov_chain(2:end), 1))

Numpy / SciPyでは、次のようになります。

def get_sparse_counts_matrix(markov_chain, number_of_states):
    return coo_matrix(([1]*(len(markov_chain) - 1), (markov_chain[0:-1], markov_chain[1:])), shape=(number_of_states, number_of_states))

そして、zip（）を使用するなど、Pythonの微調整をさらにいくつか試しました。

for old_state, new_state in zip(markov_chain[0:-1], markov_chain[1:]):
    transition_counts_matrix[old_state, new_state] += 1

そしてキュー：

old_and_new_states_holder = Queue(maxsize=2)
old_and_new_states_holder.put(markov_chain[0])
for new_state in markov_chain[1:]:
    old_and_new_states_holder.put(new_state)
    old_state = old_and_new_states_holder.get()
    transition_counts_matrix[old_state, new_state] += 1

しかし、これら3つの方法のどれも物事をスピードアップしませんでした。実際、zip（）ソリューション以外はすべて、元のソリューションよりも少なくとも10倍遅くなりました。

検討する価値のある他の解決策はありますか？

多くのチェーンから遷移行列を構築するための修正されたソリューション
上記の質問に対する最良の答えは、具体的にはDSMでした。ただし、数百万のマルコフ連鎖のリストに基づいて遷移行列を作成したい場合、最も簡単な方法は次のとおりです。

def fast_increment_transition_counts_from_chain(markov_chain, transition_counts_matrix):
    flat_coords = numpy.ravel_multi_index((markov_chain[:-1], markov_chain[1:]), transition_counts_matrix.shape)
    transition_counts_matrix.flat += numpy.bincount(flat_coords, minlength=transition_counts_matrix.size)

def get_fake_transitions(markov_chains):
    fake_transitions = []
    for i in xrange(1,len(markov_chains)):
        old_chain = markov_chains[i - 1]
        new_chain = markov_chains[i]
        end_of_old = old_chain[-1]
        beginning_of_new = new_chain[0]
        fake_transitions.append((end_of_old, beginning_of_new))
    return fake_transitions

def decrement_fake_transitions(fake_transitions, counts_matrix):
    for old_state, new_state in fake_transitions:
        counts_matrix[old_state, new_state] -= 1

def fast_get_transition_counts_matrix(markov_chains, number_of_states):
    """50% faster than original, but must store 2 additional slice copies of all markov chains in memory at once.
    You might need to break up the chains into manageable chunks that don't exceed your memory.
    """
    transition_counts_matrix = numpy.zeros([number_of_states, number_of_states])
    fake_transitions = get_fake_transitions(markov_chains)
    markov_chains = list(itertools.chain(*markov_chains))
    fast_increment_transition_counts_from_chain(markov_chains, transition_counts_matrix)
    decrement_fake_transitions(fake_transitions, transition_counts_matrix)
    return transition_counts_matrix

score 8 · Accepted Answer

キックのためだけに、そして私はそれを試してみたいと思っていたので、私はあなたの問題にNumbaを適用しました。コードでは、デコレータを追加するだけです（ただし、numbaがここで提供するjitバリアントをテストできるように、直接呼び出しを行いました）。

import numpy as np
import numba

def increment_counts_in_matrix_from_chain(markov_chain, transition_counts_matrix):
    for i in xrange(1, len(markov_chain)):
        old_state = markov_chain[i - 1]
        new_state = markov_chain[i]
        transition_counts_matrix[old_state, new_state] += 1

autojit_func = numba.autojit()(increment_counts_in_matrix_from_chain)
jit_func = numba.jit(argtypes=[numba.int64[:,::1],numba.double[:,::1]])(increment_counts_in_matrix_from_chain)

t = np.random.randint(0,50, 500)
m1 = np.zeros((50,50))
m2 = np.zeros((50,50))
m3 = np.zeros((50,50))

そしてタイミング：

In [10]: %timeit increment_counts_in_matrix_from_chain(t,m1)
100 loops, best of 3: 2.38 ms per loop

In [11]: %timeit autojit_func(t,m2)                         

10000 loops, best of 3: 67.5 us per loop

In [12]: %timeit jit_func(t,m3)
100000 loops, best of 3: 4.93 us per loop

このautojitメソッドは実行時の入力に基づいて推測を行い、jit関数には指定されたタイプがあります。jitこれらの初期段階のnumbaは、入力に間違ったタイプを渡した場合にエラーが発生したことを通知しないため、少し注意する必要があります。間違った答えを吐き出すだけです。

とはいえ、コードを変更せずに35倍と485倍の速度を上げ、numba（デコレーターとも呼ばれる）への呼び出しを追加するだけで、私の本ではかなり印象的です。おそらくcythonを使用して同様の結果を得ることができますが、もう少し定型文を作成し、setup.pyファイルを作成する必要があります。

また、コードが読み取り可能であり、アルゴリズムの実装について当初考えていた方法でコードを記述できるため、このソリューションも気に入っています。

score 6 · Accepted Answer

このようなものを利用して、np.bincountどうですか？超堅牢ではありませんが、機能的です。[セットアップしてくれた@WarrenWeckesserに感謝します。]

import numpy as np
from collections import Counter

def increment_counts_in_matrix_from_chain(markov_chain, transition_counts_matrix):
    for i in xrange(1, len(markov_chain)):
        old_state = markov_chain[i - 1]
        new_state = markov_chain[i]
        transition_counts_matrix[old_state, new_state] += 1

def using_counter(chain, counts_matrix):
    counts = Counter(zip(chain[:-1], chain[1:]))
    from_, to = zip(*counts.keys())
    counts_matrix[from_, to] = counts.values()

def using_bincount(chain, counts_matrix):
    flat_coords = np.ravel_multi_index((chain[:-1], chain[1:]), counts_matrix.shape)
    counts_matrix.flat = np.bincount(flat_coords, minlength=counts_matrix.size)

def using_bincount_reshape(chain, counts_matrix):
    flat_coords = np.ravel_multi_index((chain[:-1], chain[1:]), counts_matrix.shape)
    return np.bincount(flat_coords, minlength=counts_matrix.size).reshape(counts_matrix.shape)

これは次のようになります。

In [373]: t = np.random.randint(0,50, 500)
In [374]: m1 = np.zeros((50,50))
In [375]: m2 = m1.copy()
In [376]: m3 = m1.copy()

In [377]: timeit increment_counts_in_matrix_from_chain(t, m1)
100 loops, best of 3: 2.79 ms per loop

In [378]: timeit using_counter(t, m2)
1000 loops, best of 3: 924 us per loop

In [379]: timeit using_bincount(t, m3)
10000 loops, best of 3: 57.1 us per loop

[編集]

flat（インプレースで機能しないという犠牲を払って）回避すると、小さな行列の時間を節約できます。

In [80]: timeit using_bincount_reshape(t, m3)
10000 loops, best of 3: 22.3 us per loop

score 0 · Accepted Answer

Ok, few ideas to tamper with, with some slight improvement (at cost of human undestanding)

Let's start with a random vector of integers between 0 and 9 of length 3000:

L = 3000
N = 10
states = array(randint(N),size=L)
transitions = np.zeros((N,N))

Your method, on my machine, has a timeit performance of 11.4 ms.

The first thing for a little improvement is to avoid to read the data twice, storing it in a temporary variable:

old = states[0]
for i in range(1,len(states)):
    new = states[i]
    transitions[new,old]+=1
    old=new

This gives you a ~10% improvement and drops the time to 10.9 ms.

A more involuted approach uses the strides:

def rolling(a, window):
    shape = (a.size - window + 1, window)
    strides = (a.itemsize, a.itemsize)
    return np.lib.stride_tricks.as_strided(a, shape=shape, strides=strides)

state_2 = rolling(states, 2)
for i in range(len(state_2)):
    l,m = state_2[i,0],state_2[i,1]
    transitions[m,l]+=1

The strides allow you to read the consecutive numbers of the array tricking the array to think that the rows start in a different way (ok, it's not well described, but if you take some time to read about strides you will get it) This approach loses performance, going to 12.2 ms, but it is the hallway to trick the system even more. flattening both the transition matrix and the strided array to one dimensional arrays, you can speed up the performance a little more:

transitions = np.zeros(N*N)
state_2 = rolling(states, 2)
state_flat = np.sum(state_2 * array([1,10]),axis=1)
for i in state_flat:
    transitions[i]+=1
transitions.reshape((N,N))

This goes down to 7.75 ms. It's not an order of magnitude, but it's a 30% better anyway :)

score 0 · Accepted Answer

これがより速い方法です。アイデアは、各遷移の発生数をカウントし、そのカウントをマトリックスのベクトル化された更新で使用することです。（同じ遷移がで複数回発生する可能性があると想定していますmarkov_chain。）ライブラリのCounterクラスはcollections、各遷移の発生数をカウントするために使用されます。

from collections import Counter

def update_matrix(chain, counts_matrix):
    counts = Counter(zip(chain[:-1], chain[1:]))
    from_, to = zip(*counts.keys())
    counts_matrix[from_, to] += counts.values()

タイミングの例、ipythonの場合：

In [64]: t = np.random.randint(0,50, 500)

In [65]: m1 = zeros((50,50))

In [66]: m2 = zeros((50,50))

In [67]: %timeit increment_counts_in_matrix_from_chain(t, m1)
1000 loops, best of 3: 895 us per loop

In [68]: %timeit update_matrix(t, m2)
1000 loops, best of 3: 504 us per loop

高速ですが、桁違いに高速ではありません。実際のスピードアップのために、これをCythonに実装することを検討してください。

python - Numpyで遷移行列の作成を高速化するにはどうすればよいですか？

4 に答える 4

Related

Reference