python - numpy再配列の特定の列のdtypeを変更するにはどうすればよいですか？

Question

次のような再配列があるとします。

import numpy as np

# example data from @unutbu's answer
recs = [('Bill', '31', 260.0), ('Fred', 15, '145.0')]
r = np.rec.fromrecords(recs, formats = 'S30,i2,f4', names = 'name, age, weight')

print(r)
# [('Bill', 31, 260.0) ('Fred', 15, 145.0)]

特定の列を浮動小数点数に変換したいとします。どうすればよいですか？ndarrayに変更して、それらをrecarrayに戻す必要がありますか？

score 16 · Accepted Answer

astype変換を実行するために使用する例を次に示します。

import numpy as np
recs = [('Bill', '31', 260.0), ('Fred', 15, '145.0')]
r = np.rec.fromrecords(recs, formats = 'S30,i2,f4', names = 'name, age, weight')
print(r)
# [('Bill', 31, 260.0) ('Fred', 15, 145.0)]

これageはdtype<i2です：

print(r.dtype)
# [('name', '|S30'), ('age', '<i2'), ('weight', '<f4')]

これを<f4使用に変更できますastype：

r = r.astype([('name', '|S30'), ('age', '<f4'), ('weight', '<f4')])
print(r)
# [('Bill', 31.0, 260.0) ('Fred', 15.0, 145.0)]

score 16 · Accepted Answer

基本的に2つのステップがあります。私のつまずきは、既存のdtypeを変更する方法を見つけることでした。これは私がそれをした方法です：

# change dtype by making a whole new array
dt = data.dtype
dt = dt.descr # this is now a modifiable list, can't modify numpy.dtype
# change the type of the first col:
dt[0] = (dt[0][0], 'float64')
dt = numpy.dtype(dt)
# data = numpy.array(data, dtype=dt) # option 1
data = data.astype(dt)

score 0 · Accepted Answer

これは、既存の回答のマイナーな改良に加えて、列名ではなくdtypeに基づいて変更を加えたい状況への拡張です（たとえば、すべての浮動小数点数を整数に変更します）。

まず、listcompを使用して、簡潔さと読みやすさを向上させることができます。

col       = 'age'
new_dtype = 'float64'

r.astype( [ (col, new_dtype) if d[0] == col else d for d in r.dtype.descr ] )

# rec.array([(b'Bill', 31.0, 260.0), (b'Fred', 15.0, 145.0)], 
#           dtype=[('name', 'S30'), ('age', '<f8'), ('weight', '<f4')])

次に、この構文を拡張して、すべての浮動小数点数を整数に（またはその逆に）変更する場合を処理できます。たとえば、32ビットまたは64ビットの浮動小数点数を64ビット整数に変更する場合は、次のようにすることができます。

old_dtype = ['<f4', '<f8']
new_dtype = 'int64'

r.astype( [ (d[0], new_dtype) if d[1] in old_dtype else d for d in r.dtype.descr ] )

# rec.array([(b'Bill', 31, 260), (b'Fred', 15, 145)], 
#           dtype=[('name', 'S30'), ('age', '<i2'), ('weight', '<i8')])

astypeデフォルトでに設定されるオプションのキャスト引数があることに注意してください。これにより、floatを整数にキャストするときに誤って精度が失われないようunsafeに指定できます。casting='safe'

r.astype( [ (d[0], new_dtype) if d[1] in old_dtype else d for d in r.dtype.descr ],
          casting='safe' )

その他のオプションの詳細については、astypeに関するnumpyのドキュメントを参照してください。casting

また、floatを整数に、またはその逆に変更する一般的なケースでは、np.issubdtype複数の特定のdtypeに対してチェックするのではなく、で一般的な数値タイプをチェックすることをお勧めします。

python - numpy再配列の特定の列のdtypeを変更するにはどうすればよいですか？

3 に答える 3

Related

Reference