python - NumPy: 各 ndarray 要素に対して関数を実行する

Question

たとえば、2D座標の3次元ndarrayがあります。

[[[1704 1240]
  [1745 1244]
  [1972 1290]
  [2129 1395]
  [1989 1332]]

 [[1712 1246]
  [1750 1246]
  [1964 1286]
  [2138 1399]
  [1989 1333]]

 [[1721 1249]
  [1756 1249]
  [1955 1283]
  [2145 1399]
  [1990 1333]]]

最終的な目標は、5 つの座標の各「グループ」から特定のポイント ([1989 1332]) に最も近いポイントを削除することです。私の考えは、同様の形状の距離の配列を生成し、argmin を使用して削除する値のインデックスを決定することでした。ただし、少なくともNumPythonicの方法で、特定のポイントまでの距離を計算する関数など、ndarrayのすべての要素に関数を適用する方法がわかりません。

score 4 · Accepted Answer

リスト内包表記は、numpy 配列を扱うには非常に非効率的な方法です。それらは、距離の計算には特に適していません。

データとポイントの違いを見つけるには、単にdata - point. 次に、を使用して距離を計算するnp.hypotか、必要に応じて、距離を 2 乗して合計し、平方根をとります。

ただし、計算の目的で Nx2 配列にすると、少し簡単になります。

基本的に、次のようなものが必要です。

import numpy as np

data = np.array([[[1704, 1240],
                  [1745, 1244],
                  [1972, 1290],
                  [2129, 1395],
                  [1989, 1332]],

                 [[1712, 1246],
                  [1750, 1246],
                  [1964, 1286],
                  [2138, 1399],
                  [1989, 1333]],

                 [[1721, 1249],
                  [1756, 1249],
                  [1955, 1283],
                  [2145, 1399],
                  [1990, 1333]]])

point = [1989, 1332]

#-- Calculate distance ------------
# The reshape is to make it a single, Nx2 array to make calling `hypot` easier
dist = data.reshape((-1,2)) - point
dist = np.hypot(*dist.T)

# We can then reshape it back to AxBx1 array, similar to the original shape
dist = dist.reshape(data.shape[0], data.shape[1], 1)
print dist

これにより、次の結果が得られます。

array([[[ 299.48121811],
        [ 259.38388539],
        [  45.31004304],
        [ 153.5219854 ],
        [   0.        ]],

       [[ 290.04310025],
        [ 254.0019685 ],
        [  52.35456045],
        [ 163.37074401],
        [   1.        ]],

       [[ 280.55837182],
        [ 247.34186868],
        [  59.6405902 ],
        [ 169.77926846],
        [   1.41421356]]])

現在、最も近い要素を削除することは、単に最も近い要素を取得するよりも少し困難です。

numpy では、ブール値のインデックス付けを使用して、これをかなり簡単に行うことができます。

ただし、軸の配置について少し心配する必要があります。

重要なのは、numpy が最後の軸に沿って操作を「ブロードキャスト」することを理解することです。この場合、中央の軸に沿ってブロードキャストしたいと考えています。

また、-1軸のサイズのプレースホルダーとしても使用できます。Numpy は-1、軸のサイズとしてを入れると許容サイズを計算します。

私たちがしなければならないことは、次のようになります。

#-- Remove closest point ---------------------
mask = np.squeeze(dist) != dist.min(axis=1)
filtered = data[mask]

# Once again, let's reshape things back to the original shape...
filtered = filtered.reshape(data.shape[0], -1, data.shape[2])

1 行にすることもできますが、読みやすくするために分割しています。重要なのはdist != something、元の配列にインデックスを付けるために使用できるブール配列を生成することです。

だから、それをすべてまとめる：

import numpy as np

data = np.array([[[1704, 1240],
                  [1745, 1244],
                  [1972, 1290],
                  [2129, 1395],
                  [1989, 1332]],

                 [[1712, 1246],
                  [1750, 1246],
                  [1964, 1286],
                  [2138, 1399],
                  [1989, 1333]],

                 [[1721, 1249],
                  [1756, 1249],
                  [1955, 1283],
                  [2145, 1399],
                  [1990, 1333]]])

point = [1989, 1332]

#-- Calculate distance ------------
# The reshape is to make it a single, Nx2 array to make calling `hypot` easier
dist = data.reshape((-1,2)) - point
dist = np.hypot(*dist.T)

# We can then reshape it back to AxBx1 array, similar to the original shape
dist = dist.reshape(data.shape[0], data.shape[1], 1)

#-- Remove closest point ---------------------
mask = np.squeeze(dist) != dist.min(axis=1)
filtered = data[mask]

# Once again, let's reshape things back to the original shape...
filtered = filtered.reshape(data.shape[0], -1, data.shape[2])

print filtered

収量:

array([[[1704, 1240],
        [1745, 1244],
        [1972, 1290],
        [2129, 1395]],

       [[1712, 1246],
        [1750, 1246],
        [1964, 1286],
        [2138, 1399]],

       [[1721, 1249],
        [1756, 1249],
        [1955, 1283],
        [2145, 1399]]])

余談ですが、複数のポイントが等しく近い場合、これは機能しません。Numpy 配列は各次元に沿って同じ数の要素を持つ必要があるため、その場合はグループ化をやり直す必要があります。

score 1 · Accepted Answer

あなたの質問を正しく理解できれば、あなたが探しているのはapply_along_axis. numpyの組み込みブロードキャストを使用すると、配列からポイントを単純に差し引くことができます。

>>> a - numpy.array([1989, 1332])
array([[[-285,  -92],
        [-244,  -88],
        [ -17,  -42],
        [ 140,   63],
        [   0,    0]],

       [[-277,  -86],
        [-239,  -86],
        [ -25,  -46],
        [ 149,   67],
        [   0,    1]],

       [[-268,  -83],
        [-233,  -83],
        [ -34,  -49],
        [ 156,   67],
        [   1,    1]]])

次に、それに適用できますnumpy.linalg.norm。

>>> dist = a - numpy.array([1989, 1332])
>>> numpy.apply_along_axis(numpy.linalg.norm, 2, dist)
array([[ 299.48121811,  259.38388539,   45.31004304,  
         153.5219854 ,    0.        ],
       [ 290.04310025,  254.0019685 ,   52.35456045,  
         163.37074401,    1.        ],
       [ 280.55837182,  247.34186868,   59.6405902 ,  
         169.77926846,    1.41421356]])

最後に、いくつかのブール値マスクのトリックと、いくつかのreshape呼び出しがあります。

>>> a[normed != normed.min(axis=1).reshape((-1, 1))].reshape((3, 4, 2))
array([[[1704, 1240],
        [1745, 1244],
        [1972, 1290],
        [2129, 1395]],

       [[1712, 1246],
        [1750, 1246],
        [1964, 1286],
        [2138, 1399]],

       [[1721, 1249],
        [1756, 1249],
        [1955, 1283],
        [2145, 1399]]])

ただし、ジョー・キングトンの答えはより高速です。しかたがない。これは後世に残します。

def joes(data, point):
    dist = data.reshape((-1,2)) - point
    dist = np.hypot(*dist.T)
    dist = dist.reshape(data.shape[0], data.shape[1], 1)
    mask = np.squeeze(dist) != dist.min(axis=1)
    return data[mask].reshape((3, 4, 2))

def mine(a, point):
    dist = a - point
    normed = numpy.apply_along_axis(numpy.linalg.norm, 2, dist)
    return a[normed != normed.min(axis=1).reshape((-1, 1))].reshape((3, 4, 2))

>>> %timeit mine(data, point)
1000 loops, best of 3: 586 us per loop
>>> %timeit joes(data, point)
10000 loops, best of 3: 48.9 us per loop

score 0 · Accepted Answer

これを行うには複数の方法がありますが、リスト内包表記を使用する方法を次に示します。

距離機能:

In [35]: from numpy.linalg import norm

In [36]: dist = lambda x,y:norm(x-y)

入力データ:

In [39]: GivenMatrix = scipy.rand(3, 5, 2)

In [40]: GivenMatrix
Out[40]: 
array([[[ 0.83798666,  0.90294439],
        [ 0.8706959 ,  0.88397176],
        [ 0.91879085,  0.93512921],
        [ 0.15989245,  0.57311869],
        [ 0.82896003,  0.53589968]],

       [[ 0.0207089 ,  0.9521768 ],
        [ 0.94523963,  0.31079109],
        [ 0.41929482,  0.88559614],
        [ 0.87885236,  0.45227422],
        [ 0.58365369,  0.62095507]],

       [[ 0.14757177,  0.86101539],
        [ 0.58081214,  0.12632764],
        [ 0.89958321,  0.73660852],
        [ 0.3408943 ,  0.45420989],
        [ 0.42656333,  0.42770216]]])

In [41]: q = scipy.rand(2)

In [42]: q
Out[42]: array([ 0.03280889,  0.71057403])

出力距離を計算します。

In [44]: distances = [[dist(x, q) for x in SubMatrix] 
                      for SubMatrix in GivenMatrix]

In [45]: distances
Out[45]: 
[[0.82783910695733931,
  0.85564093542511577,
  0.91399620574915652,
  0.18720096539588818,
  0.81508758596405939],
 [0.24190557184498068,
  0.99617079746515047,
  0.42426891258164884,
  0.88459501973012633,
  0.55808740166908177],
 [0.18921712490174292,
  0.80103146210692744,
  0.86716521557255788,
  0.40079819635686459,
  0.48482888965287363]]

各部分行列の結果をランク付けするには:

In [46]: scipy.argsort(distances)
Out[46]: 
array([[3, 4, 0, 1, 2],
       [0, 2, 4, 3, 1],
       [0, 3, 4, 1, 2]])

GivenMatrix削除に関しては、個人的にはに変換してからlistを使用するのが最も簡単だと思いますdel:

>>> GivenList = GivenMatrix.tolist()

>>> del GivenList[1][2] # delete third row from the second 5-by-2 submatrix

python - NumPy: 各 ndarray 要素に対して関数を実行する

3 に答える 3

Related

Reference