python - Python scikit はマルチクラスマルチラベルパフォーマンスメトリクスを学習しますか?

Question

マルチクラスマルチラベル出力変数に対してランダムフォレスト分類器を実行しました。出力を下回りました。

My y_test values


     Degree  Nature
762721       1       7                              
548912       0       6
727126       1      12
14880        1      12
189505       1      12
657486       1      12
461004       1       0
31548        0       6
296674       1       7
121330       0      17


predicted output :

[[  1.   7.]
 [  0.   6.]
 [  1.  12.]
 [  1.  12.]
 [  1.  12.]
 [  1.  12.]
 [  1.   0.]
 [  0.   6.]
 [  1.   7.]
 [  0.  17.]]

ここで、分類器のパフォーマンスを確認したいと思います。マルチクラスマルチラベルの場合、「ハミング損失または jaccard_similarity_score」が適切なメトリックであることがわかりました。計算しようとしましたが、値のエラーが発生していました。

Error:
ValueError: multiclass-multioutput is not supported

私が試した行の下：

print hamming_loss(y_test, RF_predicted)
print jaccard_similarity_score(y_test, RF_predicted)

ありがとう、

score 7 · Accepted Answer

マルチクラス / マルチラベルのサポートされていないハミング損失を計算するには、次のようにします。

import numpy as np
y_true = np.array([[1, 1], [2, 3]])
y_pred = np.array([[0, 1], [1, 2]])
np.sum(np.not_equal(y_true, y_pred))/float(y_true.size)

0.75

confusion_matrix次のように、2 つのラベルのそれぞれについて取得することもできます。

from sklearn.metrics import confusion_matrix, precision_score
np.random.seed(42)

y_true = np.vstack((np.random.randint(0, 2, 10), np.random.randint(2, 5, 10))).T

[[0 4]
 [1 4]
 [0 4]
 [0 4]
 [0 2]
 [1 4]
 [0 3]
 [0 2]
 [0 3]
 [1 3]]

y_pred = np.vstack((np.random.randint(0, 2, 10), np.random.randint(2, 5, 10))).T

[[1 2]
 [1 2]
 [1 4]
 [1 4]
 [0 4]
 [0 3]
 [1 4]
 [1 3]
 [1 3]
 [0 4]]

confusion_matrix(y_true[:, 0], y_pred[:, 0])

[[1 6]
 [2 1]]

confusion_matrix(y_true[:, 1], y_pred[:, 1])

[[0 1 1]
 [0 1 2]
 [2 1 2]]

precision_score同様に（またはrecall_score同様の方法で）計算することもできます：

precision_score(y_true[:, 0], y_pred[:, 0])

0.142857142857

python - Python scikit はマルチクラス マルチラベル パフォーマンス メトリクスを学習しますか?

1 に答える 1

Related

Reference

python - Python scikit はマルチクラスマルチラベルパフォーマンスメトリクスを学習しますか?