python - NumPy または Pandas: NaN 値を持ちながら配列型を整数として保持する

Question

としてリストされている要素を内部に保持しながら、配列のデータ型を（または何でも）numpy固定したままにするための推奨される方法はありますか？intint64numpy.NaN

In particular, I am converting an in-house data structure to a Pandas DataFrame. In our structure, we have integer-type columns that still have NaN's (but the dtype of the column is int). It seems to recast everything as a float if we make this a DataFrame, but we'd really like to be int.

Thoughts?

Things tried:

I tried using the from_records() function under pandas.DataFrame, with coerce_float=False and this did not help. I also tried using NumPy masked arrays, with NaN fill_value, which also did not work. All of these caused the column data type to become a float.

score 111 · Accepted Answer

NaN整数配列には格納できません。これは現時点で pandas の既知の制限です。NumPy の NA 値 (R の NA と同様) の進歩を待っていましたが、NumPy がこれらの機能を取得するまでに少なくとも 6 か月から 1 年かかるようです。

http://pandas.pydata.org/pandas-docs/stable/gotchas.html#support-for-integer-na

(この機能は pandas のバージョン 0.24 から追加されましたが、デフォルトの dtype int64 (小文字) ではなく、拡張 dtype Int64 (大文字) を使用する必要があることに注意してください: https://pandas.pydata.org/pandas- docs/version/0.24/whatsnew/v0.24.0.html#optional-integer-na-support )

score 8 · Accepted Answer

パフォーマンスが主な問題でない場合は、代わりに文字列を格納できます。

df.col = df.col.dropna().apply(lambda x: str(int(x)) )

その後、好きなだけ混ぜることができNaNます。本当に整数が必要な場合は、アプリケーションに応じて、、、、、またはその他の専用の値を使用-1して、を表す0ことができます。1234567890NaN

列を一時的に複製することもできます。もう 1 つは実験的なもので、int または文字列を使用します。asserts次に、2 つが同期していることを確認するすべての適切な場所に挿入します。十分なテストの後、フロートを手放すことができます。

python - NumPy または Pandas: NaN 値を持ちながら配列型を整数として保持する

9 に答える 9

Related

Reference