r - In k-fold-cross validation, do we train algorithm on (k-1) subsets one by one or on combined (k-1) subsets at once?

Question

I mean to say, lets say I have 10 subsets (set1, set2,.....set10) of a training set. To perform 10 fold CV, according to me I should train my algorithm on rbind(set2,set3.....set9,set10) and test it on set1. Then I will train it on rbind( set1,set3,set4,....set10) and test it on set2 and so on. Am I correct ?

I have a feeling that we train algorithm on set2, set3....set10 one by one and test it on set1. This way we have 9 sets of predictions on set1 and then we can average it out. Which one is the correct way?

Any help would be greatly appreciated.

Thank you.

score 0 · Accepted Answer

1 つのセットをテスト用に残し、残りのセットを組み合わせてテストに使用するという理解は正しいです。

質問と 2 番目の回答 @ 10 倍の交差検証を参照してください

score 0 · Accepted Answer

状況は、次の図に似ています。

ここに画像の説明を入力

補足として、(予測される) クラスの事前確率がすべてのクラスでほぼ等しいことに注意を払うと、より良い結果が得られます(set1, set2,.....set10)。

これは層化 k 分割交差検証と呼ばれ、平均応答値がすべての分割でほぼ等しくなるように分割が選択されます。二分分類の場合、これは、各フォールドに 2 種類のクラスラベルがほぼ同じ割合で含まれることを意味します。

r - In k-fold-cross validation, do we train algorithm on (k-1) subsets one by one or on combined (k-1) subsets at once?

2 に答える 2

Related

Reference