algorithm - このサンプリングアルゴリズムを特定しますか? (R sample() 関数)

Question

Rで不等確率サンプリングに使用されているアルゴリズムについてもっと読みたいのですが、数時間検索しても何も見つかりませんでした。私はそれが Art of Computer Programming のアルゴリズムかもしれないと思っていましたが、それを実証することもできませんでした. R の random.c の特定の関数はと呼ばれProbSampleNoReplace()ます。

確率のベクトルと、選択された項目のベクトルを使用しprob[]た目的のサンプルサイズが与えられた場合nans[]

For each element j in prob[] assign an index perm[j]
Sort the list in order of probability value, largest first

totalmass = 1
For (h=0, n1= n-1, h<nans, h++,n1-- )
    rt = totalmass * rand(in 0:1)
    mass = 0

    **sum the probabilities, largest first, until the sum is bigger than rt**
    for(j=0;j<n1;j++)
        mass += prob[j]
        if rt <= mass then break

    ans[h] = perm[j]
    **reduce size of totalmass to reflect removed item**
    totalmass -= prob[j]

    **reset the indices to be sequential**
    for(k=j, k<n1, k++)
        prob[k] = prob[k+1]
        perm[k] = perm[k+1]

score 1 · Accepted Answer

このsample関数は、不等確率引数をサポートしています。あなたのコードフラグメントは、C を読まない私たちにはその意図が明確ではありません。

> table( sample(1:4, 100, repl=TRUE, prob=4:1) )

 1  2  3  4 
46 23 24  7

役立つかもしれない別の SO Q&A があります (引数を指定した SO 検索で見つかります)。

random.c ProbSampleNoReplace

置換なしの高速加重サンプリング

algorithm - このサンプリング アルゴリズムを特定しますか? (R sample() 関数)

1 に答える 1

Related

Reference

algorithm - このサンプリングアルゴリズムを特定しますか? (R sample() 関数)