python - numpy構造化配列サブセットをコピーなしでnumpy配列に変換する

Question

次のnumpy構造化配列があるとします:

In [250]: x
Out[250]: 
array([(22, 2, -1000000000, 2000), (22, 2, 400, 2000),
       (22, 2, 804846, 2000), (44, 2, 800, 4000), (55, 5, 900, 5000),
       (55, 5, 1000, 5000), (55, 5, 8900, 5000), (55, 5, 11400, 5000),
       (33, 3, 14500, 3000), (33, 3, 40550, 3000), (33, 3, 40990, 3000),
       (33, 3, 44400, 3000)], 
       dtype=[('f1', '<i4'), ('f2', '<f4'), ('f3', '<f4'), ('f4', '<i4')])

上記の配列のサブセットを通常のnumpy配列に変更しようとしています。私のアプリケーションでは、コピーが作成されないことが不可欠です (ビューのみ)。

フィールドは、次の関数を使用して上記の構造化配列から取得されます。

def fields_view(array, fields):
    return array.getfield(numpy.dtype(
        {name: array.dtype.fields[name] for name in fields}
    ))

フィールド「f2」と「f3」に興味がある場合は、次のようにします。

In [251]: y=fields_view(x,['f2','f3'])
In [252]: y
Out [252]:
array([(2.0, -1000000000.0), (2.0, 400.0), (2.0, 804846.0), (2.0, 800.0),
       (5.0, 900.0), (5.0, 1000.0), (5.0, 8900.0), (5.0, 11400.0),
       (3.0, 14500.0), (3.0, 40550.0), (3.0, 40990.0), (3.0, 44400.0)], 
       dtype={'names':['f2','f3'], 'formats':['<f4','<f4'], 'offsets':[4,8], 'itemsize':12})

元の構造化配列の「f2」および「f3」フィールドから ndarray を直接取得する方法があります。ただし、私のアプリケーションでは、このデータサブセットがクラスの属性であるため、この中間構造化配列を構築する必要があります。

コピーを行わないと、中間構造化配列を通常の numpy 配列に変換できません。

In [253]: y.view(('<f4', len(y.dtype.names)))
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-54-f8fc3a40fd1b> in <module>()
----> 1 y.view(('<f4', len(y.dtype.names)))

ValueError: new type not compatible with array.

この関数は、レコード配列を ndarray に変換するためにも使用できます。

def recarr_to_ndarr(x,typ):

    fields = x.dtype.names
    shape = x.shape + (len(fields),)
    offsets = [x.dtype.fields[name][1] for name in fields]
    assert not any(np.diff(offsets, n=2))
    strides = x.strides + (offsets[1] - offsets[0],)
    y = np.ndarray(shape=shape, dtype=typ, buffer=x,
               offset=offsets[0], strides=strides)
    return y

ただし、次のエラーが表示されます。

In [254]: recarr_to_ndarr(y,'<f4')
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-65-2ebda2a39e9f> in <module>()
----> 1 recarr_to_ndarr(y,'<f4')

<ipython-input-62-8a9eea8e7512> in recarr_to_ndarr(x, typ)
      8     strides = x.strides + (offsets[1] - offsets[0],)
      9     y = np.ndarray(shape=shape, dtype=typ, buffer=x,
---> 10                offset=offsets[0], strides=strides)
     11     return y
     12 

TypeError: expected a single-segment buffer object

コピーを作成すると、関数は正常に機能します。

In [255]: recarr_to_ndarr(np.array(y),'<f4')
Out[255]: 
array([[  2.00000000e+00,  -1.00000000e+09],
       [  2.00000000e+00,   4.00000000e+02],
       [  2.00000000e+00,   8.04846000e+05],
       [  2.00000000e+00,   8.00000000e+02],
       [  5.00000000e+00,   9.00000000e+02],
       [  5.00000000e+00,   1.00000000e+03],
       [  5.00000000e+00,   8.90000000e+03],
       [  5.00000000e+00,   1.14000000e+04],
       [  3.00000000e+00,   1.45000000e+04],
       [  3.00000000e+00,   4.05500000e+04],
       [  3.00000000e+00,   4.09900000e+04],
       [  3.00000000e+00,   4.44000000e+04]], dtype=float32)

2 つの配列に違いはないようです。

In [66]: y
Out[66]: 
array([(2.0, -1000000000.0), (2.0, 400.0), (2.0, 804846.0), (2.0, 800.0),
       (5.0, 900.0), (5.0, 1000.0), (5.0, 8900.0), (5.0, 11400.0),
       (3.0, 14500.0), (3.0, 40550.0), (3.0, 40990.0), (3.0, 44400.0)], 
      dtype={'names':['f2','f3'], 'formats':['<f4','<f4'], 'offsets':[4,8], 'itemsize':12})

In [67]: np.array(y)
Out[67]: 
array([(2.0, -1000000000.0), (2.0, 400.0), (2.0, 804846.0), (2.0, 800.0),
       (5.0, 900.0), (5.0, 1000.0), (5.0, 8900.0), (5.0, 11400.0),
       (3.0, 14500.0), (3.0, 40550.0), (3.0, 40990.0), (3.0, 44400.0)], 
      dtype={'names':['f2','f3'], 'formats':['<f4','<f4'], 'offsets':[4,8], 'itemsize':12})

python - numpy構造化配列サブセットをコピーなしでnumpy配列に変換する

2 に答える 2

Related

Reference