python - Python - スライディングウィンドウのベクトル化

Question

スライディングウィンドウ操作をベクトル化しようとしています。1 次元の場合、役立つ例は次のようになります。

x= vstack((np.array([range(10)]),np.array([range(10)])))

x[1,:]=np.where((x[0,:]<5)&(x[0,:]>0),x[1,x[0,:]+1],x[1,:])

インデックス <5 の各現在値の n+1 値。しかし、私はこのエラーが発生します:

x[1,:]=np.where((x[0,:]<2)&(x[0,:]>0),x[1,x[0,:]+1],x[1,:])
IndexError: index (10) out of range (0<=index<9) in dimension 1

不思議なことに、0 より小さいインデックスを意味する n-1 値の場合、このエラーは発生しません。気にしないようです:

x[1,:]=np.where((x[0,:]<5)&(x[0,:]>0),x[1,x[0,:]-1],x[1,:])

print(x)

[[0 1 2 3 4 5 6 7 8 9]
 [0 0 1 2 3 5 6 7 8 9]]

とにかくこのあたりはありますか？私のアプローチは完全に間違っていますか？コメントをいただければ幸いです。

編集：

これは私が達成したいことです。各セルの6x6近傍の平均を計算したいnumpy配列に行列を平坦化します:

matriz = np.array([[1,2,3,4,5],
   [6,5,4,3,2],
   [1,1,2,2,3],
   [3,3,2,2,1],
   [3,2,1,3,2],
   [1,2,3,1,2]])

# matrix to vector
vector2 = ndarray.flatten(matriz)

ncols = int(shape(matriz)[1])
nrows = int(shape(matriz)[0])

vector = np.zeros(nrows*ncols,dtype='float64')


# Interior pixels
if ( (i % ncols) != 0 and (i+1) % ncols != 0 and i>ncols and i<ncols*(nrows-1)):

    vector[i] = np.mean(np.array([vector2[i-ncols-1],vector2[i-ncols],vector2[i-ncols+1],vector2[i-1],vector2[i+1],vector2[i+ncols-1],vector2[i+ncols],vector2[i+ncols+1]]))

score 8 · Accepted Answer

私が問題を正しく理解していれば、インデックスを無視して、インデックスの 1 ステップのすべての数値の平均を取りたいと思います。

私はあなたの機能にパッチを当てて動作させました。あなたは次のようなことをしようとしていたと思います：

def original(matriz):

    vector2 = np.ndarray.flatten(matriz)

    nrows, ncols= matriz.shape
    vector = np.zeros(nrows*ncols,dtype='float64')

    # Interior pixels
    for i in range(vector.shape[0]):
        if ( (i % ncols) != 0 and (i+1) % ncols != 0 and i>ncols and i<ncols*(nrows-1)):

            vector[i] = np.mean(np.array([vector2[i-ncols-1],vector2[i-ncols],\
                        vector2[i-ncols+1],vector2[i-1],vector2[i+1],\
                        vector2[i+ncols-1],vector2[i+ncols],vector2[i+ncols+1]]))

スライスとビューを使用してこれを書き直しました。

def mean_around(arr):
    arr=arr.astype(np.float64)

    out= np.copy(arr[:-2,:-2])  #Top left corner
    out+= arr[:-2,2:]           #Top right corner
    out+= arr[:-2,1:-1]         #Top center
    out+= arr[2:,:-2]           #etc
    out+= arr[2:,2:]
    out+= arr[2:,1:-1]
    out+= arr[1:-1,2:]
    out+= arr[1:-1,:-2]

    out/=8.0    #Divide by # of elements to obtain mean

    cout=np.empty_like(arr)  #Create output array
    cout[1:-1,1:-1]=out      #Fill with out values
    cout[0,:]=0;cout[-1,:]=0;cout[:,0]=0;cout[:,-1]=0 #Set edges equal to zero

    return  cout

を使用np.empty_likeしてからエッジを塗りつぶすと、よりわずかに高速に見えましたnp.zeros_like。matriz最初に、配列を使用して同じことを行うことを再確認します。

print np.allclose(mean_around(matriz),original(matriz))
True

print mean_around(matriz)
[[ 0.     0.     0.     0.     0.   ]
 [ 0.     2.5    2.75   3.125  0.   ]
 [ 0.     3.25   2.75   2.375  0.   ]
 [ 0.     1.875  2.     2.     0.   ]
 [ 0.     2.25   2.25   1.75   0.   ]
 [ 0.     0.     0.     0.     0.   ]]

いくつかのタイミング:

a=np.random.rand(500,500)

print np.allclose(original(a),mean_around(a))
True

%timeit mean_around(a)
100 loops, best of 3: 4.4 ms per loop

%timeit original(a)
1 loops, best of 3: 6.6 s per loop

約 1500 倍の高速化。

numba を使用するのに適した場所のように見えます。

def mean_numba(arr):
    out=np.zeros_like(arr)
    col,rows=arr.shape

    for x in xrange(1,col-1):
        for y in xrange(1,rows-1):
            out[x,y]=(arr[x-1,y+1]+arr[x-1,y]+arr[x-1,y-1]+arr[x,y+1]+\
                      arr[x,y-1]+arr[x+1,y+1]+arr[x+1,y]+arr[x+1,y-1])/8.
    return out

nmean= autojit(mean_numba)

次に、提示されたすべてのメソッドと比較してみましょう。

a=np.random.rand(5000,5000)

%timeit mean_around(a)
1 loops, best of 3: 729 ms per loop

%timeit nmean(a)
10 loops, best of 3: 169 ms per loop

#CT Zhu's answer
%timeit it_mean(a)
1 loops, best of 3: 36.7 s per loop

#Ali_m's answer
%timeit fast_local_mean(a,(3,3))
1 loops, best of 3: 4.7 s per loop

#lmjohns3's answer
%timeit scipy_conv(a)
1 loops, best of 3: 3.72 s per loop

numba up を使用した 4 倍の速度はかなり名目上のものであり、numpy コードが取得しようとしているものとほぼ同じであることを示しています。提示された他のコードをプルしましたが、@ CTZhuの回答を変更して異なる配列サイズを含める必要がありました。

score 3 · Accepted Answer

Scipy 標準ライブラリには、スライディングウィンドウの平均を非常に高速に計算する関数があります。と呼ばれていuniform_filterます。これを使用して、次のように近傍平均関数を実装できます。

from scipy.ndimage.filters import uniform_filter
def neighbourhood_average(arr, win=3):
    sums = uniform_filter(arr, win, mode='constant') * (win*win)
    return ((sums - arr) / (win*win - 1))

これは、が自分自身を除くすべての近隣の平均でXある配列を返します。最初と最後の列、および最初と最後の行は境界条件の影響を受けるため、アプリケーションにとって無効である可能性があることに注意してください (必要に応じて、を使用して境界規則を制御できます)。X[i,j]i,jarri,jmode=

uniform_filterストレート C で実装された非常に効率的な線形時間アルゴリズム ( のサイズでのみ線形) を使用するため、特にが大きいarr場合、他のソリューションよりも簡単に優れたパフォーマンスを発揮します。win

score 2 · Accepted Answer

問題はx[1,x[0,:]+1]、2 番目の軸のインデックスであるにあります。x[0,:]+1は[1 2 3 4 5 6 7 8 9 10]、インデックス10が x の次元よりも大きいです。

の場合x[1,x[0,:]-1]、2 番目の軸のインデックスはであり、最終的には[-1 0 1 2 3 4 5 6 7 8 9]を取得[9 0 1 2 3 4 5 6 7 8]し、9最後の要素はのインデックスを持ちます-1。最後から 2 番目の要素のインデックスは -2 などです。

とnp.where((x[0,:]<5)&(x[0,:]>0),x[1,x[0,:]-1],x[1,:])でx[0,:]=[0 1 2 3 4 5 6 7 8 9]、本質的に起こっていることは、が 0 でがでx[1,:]あるため、最初のセルが形をとることです。次の 4 つの要素はから取得されます。残りはからです。最終的に結果はx[0,0]x[0,:]<5)&(x[0,:]>0Falsex[1,x[0,:]-1]x[1,:][0 0 1 2 3 4 5 6 7 8]

1 セルだけのスライディングウィンドウでは問題ないように見えるかもしれませんが、次のように驚かれることでしょう。

>>> np.where((x[0,:]<5)&(x[0,:]>0),x[1,x[0,:]-2],x[1,:])
array([0, 9, 0, 1, 2, 5, 6, 7, 8, 9])

2 つのセルのウィンドウで移動しようとすると。

この特定の問題について、すべてを 1 行にまとめたい場合は、次のようにします。

>>> for i in [1, 2, 3, 4, 5, 6]:
    print hstack((np.where(x[1,x[0,:]-i]<x[0, -i], x[1,x[0,:]-i], 0)[:5], x[0,5:]))

[0 0 1 2 3 5 6 7 8 9]
[0 0 0 1 2 5 6 7 8 9]
[0 0 0 0 1 5 6 7 8 9]
[0 0 0 0 0 5 6 7 8 9]
[0 0 0 0 0 5 6 7 8 9]
[0 0 0 0 0 5 6 7 8 9]

編集：元の質問をよりよく理解できるようになりました。基本的には、2D配列を取得して、各セルのN * Nセル平均を計算したいと考えています。それは非常に一般的です。最初に、おそらく N を奇数に制限したいと思うでしょう。そうしないと、セルの周りの 2*2 平均などを定義するのが難しくなります。3*3 の平均が必要だとします。

#In this example, the shape is (10,10)
>>> a1=\
array([[3, 7, 0, 9, 0, 8, 1, 4, 3, 3],
   [5, 6, 5, 2, 9, 2, 3, 5, 2, 9],
   [0, 9, 8, 5, 3, 1, 8, 1, 9, 4],
   [7, 4, 0, 0, 9, 3, 3, 3, 5, 4],
   [3, 1, 2, 4, 8, 8, 2, 1, 9, 6],
   [0, 0, 3, 9, 3, 0, 9, 1, 3, 3],
   [1, 2, 7, 4, 6, 6, 2, 6, 2, 1],
   [3, 9, 8, 5, 0, 3, 1, 4, 0, 5],
   [0, 3, 1, 4, 9, 9, 7, 5, 4, 5],
   [4, 3, 8, 7, 8, 6, 8, 1, 1, 8]])
#move your original array 'a1' around, use range(-2,2) for 5*5 average and so on
>>> movea1=[a1[np.clip(np.arange(10)+i, 0, 9)][:,np.clip(np.arange(10)+j, 0, 9)] for i, j in itertools.product(*[range(-1,2),]*2)]
#then just take the average
>>> averagea1=np.mean(np.array(movea1), axis=0)
#trim the result array, because the cells among the edges do not have 3*3 average
>>> averagea1[1:10-1, 1:10-1]
array([[ 4.77777778,  5.66666667,  4.55555556,  4.33333333,  3.88888889,
     3.66666667,  4.        ,  4.44444444],
   [ 4.88888889,  4.33333333,  4.55555556,  3.77777778,  4.55555556,
     3.22222222,  4.33333333,  4.66666667],
   [ 3.77777778,  3.66666667,  4.33333333,  4.55555556,  5.        ,
     3.33333333,  4.55555556,  4.66666667],
   [ 2.22222222,  2.55555556,  4.22222222,  4.88888889,  5.        ,
     3.33333333,  4.        ,  3.88888889],
   [ 2.11111111,  3.55555556,  5.11111111,  5.33333333,  4.88888889,
     3.88888889,  3.88888889,  3.55555556],
   [ 3.66666667,  5.22222222,  5.        ,  4.        ,  3.33333333,
     3.55555556,  3.11111111,  2.77777778],
   [ 3.77777778,  4.77777778,  4.88888889,  5.11111111,  4.77777778,
     4.77777778,  3.44444444,  3.55555556],
   [ 4.33333333,  5.33333333,  5.55555556,  5.66666667,  5.66666667,
     4.88888889,  3.44444444,  3.66666667]])

混乱の原因となる 2D 配列を平坦化する必要はないと思います。また、エッジ要素をトリミングする以外に別の方法で処理したい場合は、np.ma「元の配列を移動する」手順を使用してマスクされた配列を作成することを検討してください。

python - Python - スライディング ウィンドウのベクトル化

4 に答える 4

Related

Reference

python - Python - スライディングウィンドウのベクトル化