私は、predictions.csv と target.csv の 2 つのファイルを持っています。
Predictions.csv の形式:
SampleID,Target
t1,-1.0454370703147253e-05
t2,-0.48161680725663214
t3,8.1420547483708091e-06
.
.
.
t4950,-6.4382307796971309e-05
target.csv の形式:
#SampleID,Target [0 or 1],Details [-1 or 4]
1,0,4
2,0,4
3,0,4
.
.
.
4950,0,4
私が試したこと:
import numpy
from sklearn import metrics
target_file = "target.csv"
prediction_file = "predictions.csv"
true = numpy.genfromtxt(target_file,delimiter=',')
scores = numpy.genfromtxt(prediction_file, delimiter=',')
scores = scores[1:,1:]
true = true[:,2:]
fpr, tpr, thresholds = metrics.roc_curve(true, scores)
トレースバック:
Traceback (most recent call last):
File "<ipython-input-26-d4232bf9bd64>", line 1, in <module>
runfile('C:/Users/MyAccount/Documents/Spyder/Connectomics/myauc.py', wdir='C:/Users/MyAccount/Documents/Spyder/Connectomics')
File "C:\Users\MyAccount\Anaconda\lib\site-packages\spyderlib\widgets\externalshell\sitecustomize.py", line 685, in runfile
execfile(filename, namespace)
File "C:\Users\MyAccount\Anaconda\lib\site-packages\spyderlib\widgets\externalshell\sitecustomize.py", line 71, in execfile
exec(compile(scripttext, filename, 'exec'), glob, loc)
File "C:/Users/MyAccount/Documents/Spyder/Connectomics/myauc.py", line 19, in <module>
fpr, tpr, thresholds = metrics.roc_curve(true, scores)
File "C:\Users\MyAccount\Anaconda\lib\site-packages\sklearn\metrics\ranking.py", line 477, in roc_curve
y_true, y_score, pos_label=pos_label, sample_weight=sample_weight)
File "C:\Users\MyAccount\Anaconda\lib\site-packages\sklearn\metrics\ranking.py", line 297, in _binary_clf_curve
raise ValueError("Data is not binary and pos_label is not specified")
どうすれば AUC を見つけることができますか?
編集: target.csv 列が取り得る可能な値を追加しました。