python - numpy 操作からの numpy.memmap

Question

大きな画像ファイルから作成されたかなり大きな配列を使用しています。メモリの使用量が多すぎるという問題がnumpy.memmapあり、標準の代わりに配列を使用してみることにしましたnumpy.array。を作成しmemmap、画像ファイルからチャンクでデータを読み込むことができましたが、操作の結果をに読み込む方法がわかりませんmemmap。

たとえば、画像ファイルはnumpyバイナリ整数配列として読み込まれます。True指定されたセル数だけセルの任意の領域をバッファリング (拡張) する関数を作成しました。この関数は、入力配列をBooleanusingに変換しますarray.astype(bool)。配列によって作成された新しいBoolean配列を作成するにはどうすればよいですか?array.astype(bool)numpy.memmap

また、True指定されたバッファー距離よりも入力配列の端に近いセルがある場合、関数は行や列を配列の端に追加して、既存のTrueセルの周囲に完全なバッファーを配置できるようにします。これにより、配列の形状が変更されます。の形を変えることは可能numpy.memmapですか？

これが私のコードです：

def getArray(dataset):
    '''Dataset is an instance of the GDALDataset class from the
    GDAL library for working with geospatial datasets

    '''
    chunks = readRaster.GetArrayParams(dataset, chunkSize=5000)
    datPath = re.sub(r'\.\w+$', '_temp.dat', dataset.GetDescription())
    pathExists = path.exists(datPath)
    arr = np.memmap(datPath, dtype=int, mode='r+',
                    shape=(dataset.RasterYSize, dataset.RasterXSize))
    if not pathExists:
        for chunk in chunks:
            xOff, yOff, xWidth, yWidth = chunk
            chunkArr = readRaster.GetArray(dataset, *chunk)
            arr[yOff:yOff + yWidth, xOff:xOff + xWidth] = chunkArr
    return arr

def Buffer(arr, dist, ring=False, full=True):
    '''Applies a buffer to any non-zero raster cells'''
    arr = arr.astype(bool)
    nzY, nzX = np.nonzero(arr)
    minY = np.amin(nzY)
    maxY = np.amax(nzY)
    minX = np.amin(nzX)
    maxX = np.amax(nzX)
    if minY - dist < 0:
        arr = np.vstack((np.zeros((abs(minY - dist), arr.shape[1]), bool),
                         arr))
    if maxY + dist >= arr.shape[0]:
        arr = np.vstack((arr,
                         np.zeros(((maxY + dist - arr.shape[0] + 1), arr.shape[1]), bool)))
    if minX - dist < 0:
        arr = np.hstack((np.zeros((arr.shape[0], abs(minX - dist)), bool),
                         arr))
    if maxX + dist >= arr.shape[1]:
        arr = np.hstack((arr,
                         np.zeros((arr.shape[0], (maxX + dist - arr.shape[1] + 1)), bool)))
    if dist >= 0: buffOp = binary_dilation
    else: buffOp = binary_erosion
    bufDist = abs(dist) * 2 + 1
    k = np.ones((bufDist, bufDist))
    bufArr = buffOp(arr, k)
    return bufArr.astype(int)

score 1 · Accepted Answer

あなたの質問の最初の部分に答えてみましょう。結果を memmap データストアにロードします。

注: ディスク上に memmap ファイルが既に存在すると仮定します。これが入力ファイルになります。MemmapInput と呼ばれ、次のように作成されます。

fpInput = np.memmap('MemmapInput', dtype='bool', mode='w+', shape=(3,4))
del fpInput
fpOutput = np.memmap('MemmapOutput', dtype='bool', mode='w+', shape=(3,4))
del fpOutput

あなたの場合、出力ファイルは存在しないかもしれませんが、ドキュメントごとに: 'r+' 読み取りと書き込みのために既存のファイルを開きます。

'w+' 読み書き用に既存のファイルを作成または上書きします。

そのため、memmap ファイルを初めて作成するときは、'w+' を使用する必要があります。その後、ファイルを変更/上書きするには、'r+' を使用します。読み取り専用のコピーは、'r' を使用して取得できます。詳細については、 http://docs.scipy.org/doc/numpy/reference/generated/numpy.memmap.htmlを参照してください。

次に、このファイルを読み取り、いくつかの操作を実行します。要点は、結果を memamp ファイルにロードすることです。最初に memmap ファイルを作成し、ファイルに添付する必要があります。

fpInput = np.memmap('MemmapInput', dtype='bool', mode='r', shape=(3,4))
fpOutput = np.memmap('MemmapOutput', dtype='bool', mode='r+', shape=(3,4))

fpOutput memmap ファイルで好きなことをしてください。

i,j = numpy.nonzero(fpInput==True)
for indexI in i:
  for indexJ in j:
    fpOutput[indexI-1,indexJ] = True
    fpOutput[indexI, indexJ-1] = True
    fpOutput[indexI+1, indexJ] = True
    fpOutput[indexI, indexJ+1] = True

python - numpy 操作からの numpy.memmap

1 に答える 1

Related

Reference