python - 配列内のポイント間の高速加重ユークリッド距離

Question

特定の配列内のすべてのポイントから別の配列内の他のすべてのポイントまでのユークリッド加重距離を効率的に計算する必要があります。これは私が持っているコードで、期待どおりに動作します:x,yx,y

import numpy as np
import random

def rand_data(integ):
    '''
    Function that generates 'integ' random values between [0.,1.)
    '''
    rand_dat = [random.random() for _ in range(integ)]

    return rand_dat

def weighted_dist(indx, x_coo, y_coo):
    '''
    Function that calculates *weighted* euclidean distances.
    '''
    dist_point_list = []
    # Iterate through every point in array_2.
    for indx2, x_coo2 in enumerate(array_2[0]):
        y_coo2 = array_2[1][indx2]
        # Weighted distance in x.
        x_dist_weight = (x_coo-x_coo2)/w_data[0][indx] 
        # Weighted distance in y.
        y_dist_weight = (y_coo-y_coo2)/w_data[1][indx] 
        # Weighted distance between point from array_1 passed and this point
        # from array_2.
        dist = np.sqrt(x_dist_weight**2 + y_dist_weight**2)
        # Append weighted distance value to list.
        dist_point_list.append(round(dist, 8))

    return dist_point_list


# Generate random x,y data points.
array_1 = np.array([rand_data(10), rand_data(10)], dtype=float)

# Generate weights for each x,y coord for points in array_1.
w_data = np.array([rand_data(10), rand_data(10)], dtype=float)

# Generate second larger array.
array_2 = np.array([rand_data(100), rand_data(100)], dtype=float)


# Obtain *weighted* distances for every point in array_1 to every point in array_2.
dist = []
# Iterate through every point in array_1.
for indx, x_coo in enumerate(array_1[0]):
    y_coo = array_1[1][indx]
    # Call function to get weighted distances for this point to every point in
    # array_2.
    dist.append(weighted_dist(indx, x_coo, y_coo))

最後のリストdistには、最初の配列にあるポイントと同じ数のサブリストが含まれ、それぞれに 2 番目の配列にあるポイントと同じ数の要素 (加重距離) があります。

おそらくcdist関数を使用して、このコードをより効率的にする方法があるかどうかを知りたいのですが、配列に多くの要素がある場合 (私の場合は要素があります)、チェックする必要がある場合、このプロセスは非常に高価になるためです。多くの配列の距離（私も持っています）

score 5 · Accepted Answer

@Evan と @Martinis Group は正しい軌道に乗っています。Evan の回答を拡張するために、ブロードキャストを使用して Python ループなしで n 次元の加重ユークリッド距離をすばやく計算する関数を次に示します。

import numpy as np

def fast_wdist(A, B, W):
    """
    Compute the weighted euclidean distance between two arrays of points:

    D{i,j} = 
    sqrt( ((A{0,i}-B{0,j})/W{0,i})^2 + ... + ((A{k,i}-B{k,j})/W{k,i})^2 )

    inputs:
        A is an (k, m) array of coordinates
        B is an (k, n) array of coordinates
        W is an (k, m) array of weights

    returns:
        D is an (m, n) array of weighted euclidean distances
    """

    # compute the differences and apply the weights in one go using
    # broadcasting jujitsu. the result is (n, k, m)
    wdiff = (A[np.newaxis,...] - B[np.newaxis,...].T) / W[np.newaxis,...]

    # square and sum over the second axis, take the sqrt and transpose. the
    # result is an (m, n) array of weighted euclidean distances
    D = np.sqrt((wdiff*wdiff).sum(1)).T

    return D

これが正常に動作することを確認するために、ネストされた Python ループを使用する遅いバージョンと比較します。

def slow_wdist(A, B, W):

    k,m = A.shape
    _,n = B.shape
    D = np.zeros((m, n))

    for ii in xrange(m):
        for jj in xrange(n):
            wdiff = (A[:,ii] - B[:,jj]) / W[:,ii]
            D[ii,jj] = np.sqrt((wdiff**2).sum())
    return D

まず、2 つの関数が同じ答えを返すことを確認しましょう。

# make some random points and weights
def setup(k=2, m=100, n=300):
    return np.random.randn(k,m), np.random.randn(k,n),np.random.randn(k,m)

a, b, w = setup()
d0 = slow_wdist(a, b, w)
d1 = fast_wdist(a, b, w)

print np.allclose(d0, d1)
# True

言うまでもなく、Python ループではなくブロードキャストを使用するバージョンは、桁違いに高速です。

%%timeit a, b, w = setup()
slow_wdist(a, b, w)
# 1 loops, best of 3: 647 ms per loop

%%timeit a, b, w = setup()
fast_wdist(a, b, w)
# 1000 loops, best of 3: 620 us per loop

score 2 · Accepted Answer

次のようなコードを使用すると、ループを回避できます。

def compute_distances(A, B, W):
    Ax = A[:,0].reshape(1, A.shape[0])
    Bx = B[:,0].reshape(A.shape[0], 1)
    dx = Bx-Ax

    # Same for dy
    dist = np.sqrt(dx**2 + dy**2) * W
    return dist

これは、配列に十分なメモリがある限り、ループするものよりも Python ではるかに高速に実行されます。

python - 配列内のポイント間の高速加重ユークリッド距離

4 に答える 4

Related

Reference