python - 経験的/サンプルコバリオグラムを計算するための高速で洗練された方法

Question

Pythonで可能であれば、経験的/サンプルコバリオグラムを計算する良い方法を知っている人はいますか?

ここに画像の説明を入力

私がそれを正しく理解していれば、特定のラグ/幅 h に対して、h (または h 未満) で区切られたすべてのポイントのペアを取得し、その値を乗算し、これらのポイントのそれぞれについて、その平均を計算することになっています、この場合、m(x_i) として定義されます。ただし、m(x_{i}) の定義によれば、m(x1) を計算するには、x1 から距離 h 内にある値の平均を取得する必要があります。これは非常に集中的な計算のように見えます。

まず、私はこれを正しく理解していますか？もしそうなら、2次元空間を仮定してこれを計算する良い方法は何ですか? これを Python で (numpy と pandas を使用して) コーディングしようとしましたが、数秒かかり、正しいかどうかさえわかりません。そのため、ここにコードを投稿することは控えます。非常に単純な実装の別の試みを次に示します。

from scipy.spatial.distance import pdist, squareform
distances = squareform(pdist(np.array(coordinates))) # coordinates is a nx2 array
z = np.array(z) # z are the values
cutoff = np.max(distances)/3.0 # somewhat arbitrary cutoff
width = cutoff/15.0
widths = np.arange(0, cutoff + width, width)
Z = []
Cov = []

for w in np.arange(len(widths)-1): # for each width
    # for each pairwise distance
    for i in np.arange(distances.shape[0]): 
        for j in np.arange(distances.shape[1]): 
            if distances[i, j] <= widths[w+1] and distances[i, j] > widths[w]:
                m1 = []
                m2 = []
                # when a distance is within a given width, calculate the means of
                # the points involved
                for x in np.arange(distances.shape[1]):
                    if distances[i,x] <= widths[w+1] and distances[i, x] > widths[w]:
                        m1.append(z[x])

                for y in np.arange(distances.shape[1]):
                    if distances[j,y] <= widths[w+1] and distances[j, y] > widths[w]:
                        m2.append(z[y])

                mean_m1 = np.array(m1).mean() 
                mean_m2 = np.array(m2).mean()
                Z.append(z[i]*z[j] - mean_m1*mean_m2)
    Z_mean = np.array(Z).mean() # calculate covariogram for width w
    Cov.append(Z_mean) # collect covariances for all widths

ただし、コードにエラーがあることを確認しました。バリオグラムを使用してコバリオグラムを計算したため（コバリオグラム（h）=コバリオグラム（0）-バリオグラム（h））、別のプロットが得られたことがわかりました。

ここに画像の説明を入力

そして、それは次のようになるはずです：

ここに画像の説明を入力

最後に、経験的コバリオグラムを計算するための Python/R/MATLAB ライブラリを知っている場合は、お知らせください。少なくとも、そのようにして、自分が何をしたかを確認できます。

score 5 · Accepted Answer

を使用することもできますscipy.covが、直接計算を行う場合 (これは非常に簡単です)、これを高速化する方法が他にもあります。

まず、いくつかの空間的な相関関係を持つ偽のデータをいくつか作成します。これを行うには、最初に空間相関を作成し、次にこれを使用して生成されたランダムデータポイントを使用します。データは基になるマップに従って配置され、基になるマップの値も取得します。

編集 1:
データポイントジェネレーターを変更して、位置が完全にランダムになるようにしましたが、Z 値は空間マップに比例します。そして、全体的に負の相関関係になるように左右をずらすようにマップを変更しましたh。

from numpy import *
import random
import matplotlib.pyplot as plt

S = 1000
N = 900
# first, make some fake data, with correlations on two spatial scales
#     density map
x = linspace(0, 2*pi, S)
sx = sin(3*x)*sin(10*x)
density = .8* abs(outer(sx, sx))
density[:,:S//2] += .2
#     make a point cloud motivated by this density
random.seed(10)  # so this can be repeated
points = []
while len(points)<N:
    v, ix, iy = random.random(), random.randint(0,S-1), random.randint(0,S-1)
    if True: #v<density[ix,iy]:
        points.append([ix, iy, density[ix,iy]])
locations = array(points).transpose()
print locations.shape
plt.imshow(density, alpha=.3, origin='lower')
plt.plot(locations[1,:], locations[0,:], '.k')
plt.xlim((0,S))
plt.ylim((0,S))
plt.show()
#     build these into the main data: all pairs into distances and z0 z1 values
L = locations
m = array([[math.sqrt((L[0,i]-L[0,j])**2+(L[1,i]-L[1,j])**2), L[2,i], L[2,j]] 
                         for i in range(N) for j in range(N) if i>j])

これにより、次のことが得られます。

ここに画像の説明を入力

上記は単なるシミュレートされたデータであり、その生産などを最適化しようとはしませんでした。データは実際の状況に既に存在するため、これが OP の開始場所であり、以下のタスクであると想定しています。

次に、「コバリオグラム」を計算します (これは、偽のデータを生成するよりもはるかに簡単です)。ここでの考え方は、すべてのペアと関連する値をでソートし、hを使用してこれらにインデックスを付けることihvalsです。つまり、インデックスまでihvalの合計は、式の合計です。これには、目的の値を下回る sN(h)を持つすべてのペアが含まれるためです。h

編集 2:
以下のコメントで示唆されているように、との間のすべてのペアではなく、とN(h)の間にあるペアのみになりました(は-valuesの間隔です。つまり、以下では S/1000 が使用されました)。h-dhh0hdhhihvals

# now do the real calculations for the covariogram
#    sort by h and give clear names
i = argsort(m[:,0])  # h sorting
h = m[i,0]
zh = m[i,1]
zsh = m[i,2]
zz = zh*zsh

hvals = linspace(0,S,1000)  # the values of h to use (S should be in the units of distance, here I just used ints)
ihvals = searchsorted(h, hvals)
result = []
for i, ihval in enumerate(ihvals[1:]):
    start, stop = ihvals[i-1], ihval
    N = stop-start
    if N>0:
        mnh = sum(zh[start:stop])/N
        mph = sum(zsh[start:stop])/N
        szz = sum(zz[start:stop])/N
        C = szz-mnh*mph
        result.append([h[ihval], C])
result = array(result)
plt.plot(result[:,0], result[:,1])
plt.grid()
plt.show()

ここに画像の説明を入力

これは、h 値の予想どおりに隆起や谷が見られるため、私には合理的に見えますが、慎重なチェックは行っていません。

ここでの主な高速化scipy.covは、すべての積を事前に計算できることですzz。そうしないと、 new ごとにzhandがフィードzshされ、すべての製品が再計算されます。この計算は、各 timestepで from toなどの部分和を実行することでさらに高速化できますが、それが必要になるとは思えません。covhihvals[n-1]ihvals[n]n

python - 経験的/サンプルコバリオグラムを計算するための高速で洗練された方法

1 に答える 1

Related

Reference