python - numpyを使用してデータをインポートする際に列名を保持する方法は?

Question

次のように、Python で numpy ライブラリを使用してCSVファイルデータをにインポートしてndarrayいます。

data = np.genfromtxt('mydata.csv', 
                     delimiter='\,', dtype=None, names=True)

結果は、次の列名を提供します。

print(data.dtype.names)

('row_label',
 'MyDataColumn1_0',
 'MyDataColumn1_1')

元の列名は次のとおりです。

row_label, My-Data-Column-1.0, My-Data-Column-1.1

NumPy列名に C スタイルの変数名フォーマットを採用するよう強制しているようです。それでも、私の Python スクリプトでは、列名に従って列にアクセスする必要がある場合が多いため、列名が一定のままであることを確認する必要があります。これを達成NumPyするには、元の列名を保持するか、列名を使用している形式に変換する必要がありますNumPy。

インポート中に元の列名を保持する方法はありますか?
そうでない場合、列ラベルを変換して、使用している形式を使用する簡単な方法はありNumPyますNumPyか?

score 5 · Accepted Answer

を設定するnames=Trueと、データファイルの最初の行がこの関数を介して渡されます。

validate_names = NameValidator(excludelist=excludelist,
                               deletechars=deletechars,
                               case_sensitive=case_sensitive,
                               replace_space=replace_space)

指定できるオプションは次のとおりです。

excludelist : sequence, optional
    A list of names to exclude. This list is appended to the default list
    ['return','file','print']. Excluded names are appended an underscore:
    for example, `file` would become `file_`.
deletechars : str, optional
    A string combining invalid characters that must be deleted from the
    names.
defaultfmt : str, optional
    A format used to define default field names, such as "f%i" or "f_%02i".
autostrip : bool, optional
    Whether to automatically strip white spaces from the variables.
replace_space : char, optional
    Character(s) used in replacement of white spaces in the variables
    names. By default, use a '_'.

deletecharsおそらく、空の文字列である独自の文字列を提供しようとすることができます。ただし、これを変更して渡す方がよいでしょう。

defaultdeletechars = set("""~!@#$%^&*()-=+~\|]}[{';: /?.>,<""")

そのセットからピリオドとマイナス記号を取り出して、次のように渡します。

np.genfromtxt(..., names=True, deletechars="""~!@#$%^&*()=+~\|]}[{';: /?>,<""")

ソースは次のとおりです: https://github.com/numpy/numpy/blob/master/numpy/lib/_iotools.py#l245

python - numpyを使用してデータをインポートする際に列名を保持する方法は?

1 に答える 1

Related

Reference