r - カテゴリが欠落している R パッケージキャレット混乱マトリックス

Question

Rパッケージの関数を使用して、confusionMatrix持っているデータの統計を計算しています。関数で使用されるテーブルを取得するために、予測と実際の値を関数に入れています。carettableconfusionMatrix

table(predicted,actual)

ただし、複数の可能な結果 (例: A、B、C、D) があり、私の予測は常にすべての可能性を表すとは限りません (例: A、B、D のみ)。関数の結果の出力にtableは、欠落している結果が含まれておらず、次のようになります。

    A    B    C    D
A  n1   n2   n2   n4  
B  n5   n6   n7   n8  
D  n9  n10  n11  n12
# Note how there is no corresponding row for `C`.

関数はconfusionMatrix不足している結果を処理できず、エラーが発生します。

Error in !all.equal(nrow(data), ncol(data)) : invalid argument type

関数を別の方法で使用しtableて欠落している行をゼロで取得したり、関数を別の方法で使用して、欠落してconfusionMatrixいる結果をゼロとして表示したりする方法はありますか?

注: テスト対象のデータをランダムに選択しているため、予測結果だけでなく、実際の結果でもカテゴリが表されない場合があります。これで解決策が変わるとは思いません。

score 5 · Accepted Answer

オブジェクトで呼び出すことに加えて、confusionMatrixとして呼び出すことができる最初のメモ。ただし、と(両方とも s と見なされる) のレベル数が同じでない場合、関数はエラーをスローします。confusionMatrix(predicted, actual)tablepredictedactualfactor

これ (そして、caret最初から依存関係を正しく取得していないため、パッケージがエラーを吐き出すという事実) が、独自の関数を作成することをお勧めする理由です。

# Create a confusion matrix from the given outcomes, whose rows correspond
# to the actual and the columns to the predicated classes.
createConfusionMatrix <- function(act, pred) {
  # You've mentioned that neither actual nor predicted may give a complete
  # picture of the available classes, hence:
  numClasses <- max(act, pred)
  # Sort predicted and actual as it simplifies what's next. You can make this
  # faster by storing `order(act)` in a temporary variable.
  pred <- pred[order(act)]
  act  <- act[order(act)]
  sapply(split(pred, act), tabulate, nbins=numClasses)
}

# Generate random data since you've not provided an actual example.
actual    <- sample(1:4, 1000, replace=TRUE)
predicted <- sample(c(1L,2L,4L), 1000, replace=TRUE)

print( createConfusionMatrix(actual, predicted) )

それはあなたに与えるでしょう：

      1  2  3  4
[1,] 85 87 90 77
[2,] 78 78 79 95
[3,]  0  0  0  0
[4,] 89 77 82 83

r - カテゴリが欠落している R パッケージ キャレット混乱マトリックス

3 に答える 3

Related

Reference

r - カテゴリが欠落している R パッケージキャレット混乱マトリックス