r - Rとランダムフォレスト:キャレットとpROCは正と負のクラスをどのように扱いますか?

Question

過去数日間、R のランダムフォレストの実装のパフォーマンスと、以下を取得するために利用可能なさまざまなツールを分析してきました。

AUC
感度
特異性

したがって、私は2つの異なる方法を使用しました：

さまざまなカットオフポイントでモデルのパフォーマンスを取得するために、pROCライブラリから mroc と coordsを取得します。
モデルの最適なパフォーマンス (AUC、精度、感度、特異性など) を取得するためのキャレットライブラリからの混乱マトリックス

ポイントは、両方のアプローチにいくつかの違いがあることに気付いたことです。

次のコードを開発しました。

suppressMessages(library(randomForest))
suppressMessages(library(pROC))
suppressMessages(library(caret))

set.seed(100)

t_x <- as.data.frame(matrix(runif(100),ncol=10))
t_y <- factor(sample(c("A","B"), 10, replace = T), levels=c("A","B"))

v_x  <- as.data.frame(matrix(runif(50),ncol=10))
v_y <- factor(sample(c("A","B"), 5, replace = T), levels=c("A","B"))

model <- randomForest(t_x, t_y, ntree=1000, importance=T);
prob.out <- predict(model, v_x, type="prob")[,1];
prediction.out <- predict(model, v_x, type="response");

mroc <- roc(v_y,prob.out,plot=F)

results <- coords(mroc,seq(0, 1, by = 0.01),input=c("threshold"),ret=c("sensitivity","specificity","ppv","npv"))

accuracyData <- confusionMatrix(prediction.out,v_y)

results変数とaccuracyData変数を比較すると、感度と特異度の関係が逆になっていることがわかります。

つまり、confusionMatrix の結果は次のようになります。

Confusion Matrix and Statistics

          Reference
Prediction A B
         A 1 1
         B 2 1

               Accuracy : 0.4             
                 95% CI : (0.0527, 0.8534)
    No Information Rate : 0.6             
    P-Value [Acc > NIR] : 0.913           

                  Kappa : -0.1538         
 Mcnemar's Test P-Value : 1.000           

            Sensitivity : 0.3333          
            Specificity : 0.5000          
         Pos Pred Value : 0.5000          
         Neg Pred Value : 0.3333          
             Prevalence : 0.6000          
         Detection Rate : 0.2000          
   Detection Prevalence : 0.4000          
      Balanced Accuracy : 0.4167          

       'Positive' Class : A

しかし、座標計算でそのような感度と特異性を探すと、それらが交換されていることがわかります。

     sensitivity specificity       ppv       npv
0.32         0.5   0.3333333 0.3333333 0.5000000

どうやら、感度と特異度はcoordsとconfusionMatrixで反対です。

混乱行列が正のクラスを正しく識別していることを考慮して、感度と特異度のこの適切な解釈を想定しています。

私の質問は次のとおりです。座標に正と負のクラスを私が望むように解釈させる方法はありますか?

score 4 · Accepted Answer

の出力を見ると、次のconfusionMatrixことがわかります。

       'Positive' Class : A

を見るとmroc、クラス B がポジティブクラスと見なされます。

Data: prob.out in 3 controls (v_y A) < 2 cases (v_y B).

基本的に、pROCファクターのレベルをネガティブ、ポジティブとして取り、caret正反対のことを行います。pROC同じ動作を得るために、レベルを明示的に指定できます。

mroc <- roc(v_y,prob.out,plot=F, levels = c("B", "A"))

または、好みの動作に応じて、次のpositive引数を使用しconfusionMatrixます。

accuracyData <- confusionMatrix(prediction.out,v_y, positive = "B")

r - Rとランダムフォレスト:キャレットとpROCは正と負のクラスをどのように扱いますか?

2 に答える 2

Related

Reference