python-3.x - 複数のデータ型を持つデータセットをエンコードする方法は?

質問する 2020-10-02T06:12:50.397

176 次

次のようなデータセットがあります。

e = pd.DataFrame({
    'col1': ['A', 'A', 'B', 'W', 'F', 'C'],
    'col2': [2, 1, 9, 8, 7, 4],
    'col3': [0, 1, 9, 4, 2, 3],
    'col4': ['a', 'B', 'c', 'D', 'e', 'F']
})

ここでは、を使用してデータをエンコードしましたsklearn.preprocessing.LabelEncoder。次のコード行によって:

x = list(e.columns)
# Import label encoder 
from sklearn import preprocessing 
  
# label_encoder object knows how to understand word labels. 
label_encoder = preprocessing.LabelEncoder() 
for i in x:  
# Encode labels in column 'species'. 
    e[i] = label_encoder.fit_transform(e[i])
print(e)

intしかし、これは必須ではないタイプの数値データポイントでさえもエンコードしています。

エンコードされたデータセット:

col1  col2  col3  col4
0     0     1     0     3
1     0     0     1     0
2     1     5     5     4
3     4     4     4     1
4     3     3     2     5
5     2     2     3     2

どうすればこれを修正できますか?

python-3.x - 複数のデータ型を持つデータセットをエンコードする方法は?

2 に答える 2

Related

Reference