python - 大きな配列 (ラスター) でフィルター処理された要素のサイズを選択する

Question

これについて助けが必要かもしれません：

大きなブール値の numpy 配列 (インポートされたラスター) (2000x2000) で、800 単位を超える要素のみを選択しようとしました。(全要素数 > 1000)

私はループを試しました：

labeled_array, num_features = scipy.ndimage.label(my_np_array, structure = None, output = np.int)

print num_features

RasterYSize, RasterXSize = my_np_array.shape
big_zones = np.zeros((RasterYSize, RasterXSize), dtype=np.bool)

print "Iterating in progress"
# Improvement could be needed to reduce the number of loops
for i in range(1, num_features):
    zone_array = (labeled_array == i)
    zone = np.sum(zone_array)
    if zone > 800:
        big_zones += zone_array

しかし、これを行うためのより良い方法があると確信しています。

score 1 · Accepted Answer

np.bincountビニングbig_zones += zone_arrayに基づくベクトル化されたアプローチを次に示しますnp.in1d-

from scipy.ndimage import label

# Label with scipy
labeled_array, num_features = label(my_np_array, structure = None, output = np.int)

# Set the threshold
thresh = 800

# Get the binned counts with "np.bincount" and check against threshold
matches = np.bincount(labeled_array.ravel())>thresh

# Get the IDs corresponding to matches and get rid of the starting "0" and 
# "num_features", as you won't have those in "range(1, num_features)" either
match_feat_ID = np.nonzero(matches)[0]
valid_match_feat_ID = np.setdiff1d(match_feat_ID,[0,num_features])

# Finally, use "np.in1d" to do ORing operation corresponding to the iterative 
# "big_zones += zone_array" operation on the boolean array "big_zones". 
# Since "np.in1d"  works with 1D arrays only, reshape back to 2D shape
out = np.in1d(labeled_array,valid_match_feat_ID).reshape(labeled_array.shape)

実行時テストと検証出力

関数定義 -

def original_app(labeled_array,num_features,thresh):
    big_zones = np.zeros((my_np_array.shape), dtype=np.bool)
    for i in range(1, num_features):
        zone_array = (labeled_array == i)
        zone = np.sum(zone_array)
        if zone > thresh:
            big_zones += zone_array
    return big_zones

def vectorized_app(labeled_array,num_features,thresh):
    matches = np.bincount(labeled_array.ravel())>thresh
    match_feat_ID = np.nonzero(matches)[0]
    valid_match_feat_ID = np.setdiff1d(match_feat_ID,[0,num_features])
    return np.in1d(labeled_array,valid_match_feat_ID).reshape(labeled_array.shape)

タイミングと出力の検証 -

In [2]: # Inputs
   ...: my_np_array = np.random.rand(200,200)>0.5
   ...: labeled_array, num_features = label(my_np_array, structure = None, output = np.int)
   ...: thresh = 80
   ...: 

In [3]: out1 = original_app(labeled_array,num_features,thresh)

In [4]: out2 = vectorized_app(labeled_array,num_features,thresh)

In [5]: np.allclose(out1,out2)
Out[5]: True

In [6]: %timeit original_app(labeled_array,num_features,thresh)
1 loops, best of 3: 407 ms per loop

In [7]: %timeit vectorized_app(labeled_array,num_features,thresh)
100 loops, best of 3: 2.5 ms per loop

python - 大きな配列 (ラスター) でフィルター処理された要素のサイズを選択する

1 に答える 1

Related

Reference