python - パンダ:列のデータ型を変更できません

Question

パンダデータフレームの列データ型を変更するには、こちらのアドバイスに従っていました。ただし、列名ではなくインデックス番号で列を参照すると、うまくいかないようです。これを正しく行う方法はありますか？

In [49]: df.iloc[:, 4:].astype(int)
Out[49]: 
&ltclass 'pandas.core.frame.DataFrame'&gt
Int64Index: 5074 entries, 0 to 5073
Data columns (total 3 columns):
5    5074  non-null values
6    5074  non-null values
7    5074  non-null values
dtypes: int64(3) 

In [50]: df.iloc[:, 4:] = df.iloc[:, 4:].astype(int)

In [51]: df
Out[51]: 
&ltclass 'pandas.core.frame.DataFrame'&gt
Int64Index: 5074 entries, 0 to 5073
Data columns (total 7 columns):
1    5074  non-null values
2    5074  non-null values
3    5074  non-null values
4    5074  non-null values
5    5074  non-null values
6    5074  non-null values
7    5074  non-null values
dtypes: object(7) 

In [52]:

score 2 · Accepted Answer

このようにしてください

In [49]: df = DataFrame([['1','2','3','.4',5,6.,'foo']],columns=list('ABCDEFG'))

In [50]: df
Out[50]: 
   A  B  C   D  E  F    G
0  1  2  3  .4  5  6  foo

In [51]: df.dtypes
Out[51]: 
A     object
B     object
C     object
D     object
E      int64
F    float64
G     object
dtype: object

列を 1 つずつ割り当てる必要がある

In [52]: for k, v in df.iloc[:,0:4].convert_objects(convert_numeric=True).iteritems():
    df[k] = v
   ....:     

In [53]: df.dtypes
Out[53]: 
A      int64
B      int64
C      int64
D    float64
E      int64
F    float64
G     object
dtype: object

通常、オブジェクトの変換は正しいことを行うため、これを行うのが最も簡単です

In [54]: df = DataFrame([['1','2','3','.4',5,6.,'foo']],columns=list('ABCDEFG'))

In [55]: df.convert_objects(convert_numeric=True).dtypes
Out[55]: 
A      int64
B      int64
C      int64
D    float64
E      int64
F    float64
G     object
dtype: object

df.iloc[:,4:]右側のシリーズでvia を割り当てると、必要に応じてデータの変更タイプがコピーされるため、これは理論的には機能すると思いますが、オブジェクトの dtype が実数に変更されないようにする非常にあいまいなバグにぶつかっていると思われます( int/float) dtype を意味します。おそらく今のところ上げるべきです。

これを追跡する問題は次のとおりです: https://github.com/pydata/pandas/issues/4312

python - パンダ:列のデータ型を変更できません

1 に答える 1

Related

Reference