python - 乱数行列の計算を高速化

Question

ランダムなテプリッツ行列を作成して、それらが可逆である確率を推定しています。私の現在のコードは

import random
from scipy.linalg import toeplitz
import numpy as np
for n in xrange(1,25):
    rankzero = 0
    for repeats in xrange(50000):
        column = [random.choice([0,1]) for x in xrange(n)]
        row = [column[0]]+[random.choice([0,1]) for x in xrange(n-1)]
        matrix = toeplitz(column, row)
        if  (np.linalg.matrix_rank(matrix) < n):
            rankzero += 1
    print n, (rankzero*1.0)/50000

これは高速化できますか？

精度を高めるために値を 50000 に増やしたいのですが、現時点では遅すぎます。

for n in xrange(10,14)ショーのみを使用したプロファイリング

  400000    9.482    0.000    9.482    0.000 {numpy.linalg.lapack_lite.dgesdd}
  4400000    7.591    0.000   11.089    0.000 random.py:272(choice)
   200000    6.836    0.000   10.903    0.000 index_tricks.py:144(__getitem__)
        1    5.473    5.473   62.668   62.668 toeplitz.py:3(<module>)
   800065    4.333    0.000    4.333    0.000 {numpy.core.multiarray.array}
   200000    3.513    0.000   19.949    0.000 special_matrices.py:128(toeplitz)
   200000    3.484    0.000   20.250    0.000 linalg.py:1194(svd)
6401273/6401237    2.421    0.000    2.421    0.000 {len}
   200000    2.252    0.000   26.047    0.000 linalg.py:1417(matrix_rank)
  4400000    1.863    0.000    1.863    0.000 {method 'random' of '_random.Random' objects}
  2201015    1.240    0.000    1.240    0.000 {isinstance}
[...]

score 3 · Accepted Answer

1 つの方法は、値が置かれているインデックスをキャッシュすることによって、toeplitz() 関数を繰り返し呼び出すことからいくらかの作業を省くことです。次のコードは、元のコードよりも ~ 30% 高速です。残りのパフォーマンスはランク計算にあります...そして、0と1のテプリッツ行列のより高速なランク計算が存在するかどうかはわかりません。

(更新) matrix_rank を scipy.linalg.det() == 0 に置き換えると、コードは実際には ~ 4 倍高速になります (行列式は小さな行列のランク計算よりも高速です)

import random
from scipy.linalg import toeplitz, det
import numpy as np,numpy.random

class si:
    #cache of info for toeplitz matrix construction
    indx = None
    l = None

def xtoeplitz(c,r):
    vals = np.concatenate((r[-1:0:-1], c))
    if si.indx is None or si.l != len(c):
        a, b = np.ogrid[0:len(c), len(r) - 1:-1:-1]
        si.indx = a + b
        si.l = len(c)
    # `indx` is a 2D array of indices into the 1D array `vals`, arranged so
    # that `vals[indx]` is the Toeplitz matrix.
    return vals[si.indx]

def doit():
    for n in xrange(1,25):
        rankzero = 0
        si.indx=None

        for repeats in xrange(5000):

            column = np.random.randint(0,2,n)
            #column=[random.choice([0,1]) for x in xrange(n)] # original code

            row = np.r_[column[0], np.random.randint(0,2,n-1)]
            #row=[column[0]]+[random.choice([0,1]) for x in xrange(n-1)] #origi

            matrix = xtoeplitz(column, row)
            #matrix=toeplitz(column,row) # original code

            #if  (np.linalg.matrix_rank(matrix) < n): # original code
            if  np.abs(det(matrix))<1e-4: # should be faster for small matrices
                rankzero += 1
        print n, (rankzero*1.0)/50000

score 2 · Accepted Answer

0 と 1 のリストを作成する次の 2 行:

column = [random.choice([0,1]) for x in xrange(n)]
row = [column[0]]+[random.choice([0,1]) for x in xrange(n-1)]

いくつかの非効率性があります。不必要に多くのリストを作成、展開、破棄し、リストに対して random.choice() を呼び出して、実際には 1 つのランダムビットだけを取得します。次のように約500％高速化しました。

column = [0 for i in xrange(n)]
row = [0 for i in xrange(n)]

# NOTE: n must be less than 32 here, or remove int() and lose some speed
cbits = int(random.getrandbits(n))
rbits = int(random.getrandbits(n))

for i in xrange(n):
    column[i] = cbits & 1
    cbits >>= 1
    row[i] = rbits & 1
    rbits >>= 1

row[0] = column[0]

score 1 · Accepted Answer

dgesdd元のコードは最初に入力行列の LU 分解を計算して線形システムを解くために lapack ルーチンを呼び出しているようです。

で置き換えるmatrix_rankと、入力行列の LU 分解のみを計算するdetlapack のを使用して行列式を計算します ( http://docs.scipy.org/doc/numpy/reference/generated/numpy.linalg.det.html )。dgetrf

matrix_rankしたがって、との両方の呼び出しの漸近的な複雑さはdet、LU 分解の複雑さである O(n^3) になります。

ただし、Toepelitz システムは O(n^2) で解くことができます (Wikipedia によると)。したがって、コードを大きな行列で実行したい場合は、特殊なライブラリを呼び出す Python 拡張機能を作成するのが理にかなっています。

python - 乱数行列の計算を高速化

3 に答える 3

Related

Reference