algorithm - 指定されたパーセンテージに比例するシーケンスを生成するアルゴリズム

Question

オブジェクトのマップと指定された比率が与えられた場合 (簡単にするために合計 100 になるとしましょう):

val ss : Map[String,Double] = Map("A"->42, "B"->32, "C"->26)

nサイズのサブセットに対して、 ~42% の "A"、~32% の "B"、~26% の "C" が存在するようなシーケンスを生成するにはどうすればよいですか? (明らかに、小さいnほど誤差が大きくなります)。

(作業言語は Scala ですが、アルゴリズムを求めているだけです。)

更新: たとえば、シーケンスが開始する可能性が ~16% で、開始するAA可能性が ~11% でBBあり、n正確に == (比率の合計) が配布は完璧でしょう。したがって、@MvGの回答に従って、次のように実装しました。

/**
Returns the key whose achieved proportions are most below desired proportions
*/
def next[T](proportions : Map[T, Double], achievedToDate : Map[T,Double]) : T = {
    val proportionsSum = proportions.values.sum
    val desiredPercentages = proportions.mapValues(v => v / proportionsSum)
    //Initially no achieved percentages, so avoid / 0 
    val toDateTotal = if(achievedToDate.values.sum == 0.0){
        1
    }else{
        achievedToDate.values.sum
    }
    val achievedPercentages = achievedToDate.mapValues(v => v / toDateTotal)
    val gaps = achievedPercentages.map{ case (k, v) =>
        val gap = desiredPercentages(k) - v
        (k -> gap)
    }
    val maxUnder = gaps.values.toList.sortWith(_ > _).head
    //println("Max gap is " + maxUnder)
    val gapsForMaxUnder = gaps.mapValues{v => Math.abs(v - maxUnder) < Double.Epsilon }
    val keysByHasMaxUnder = gapsForMaxUnder.map(_.swap)
    keysByHasMaxUnder(true)
}

/**
Stream of most-fair next element 
*/
def proportionalStream[T](proportions : Map[T, Double], toDate : Map[T, Double]) : Stream[T] = {
    val nextS = next(proportions, toDate)
    val tailToDate = toDate + (nextS -> (toDate(nextS) + 1.0))
    Stream.cons(
        nextS,
        proportionalStream(proportions, tailToDate)
    )
}

たとえば、次のように使用されます。

val ss : Map[String,Double] = Map("A"->42, "B"->32, "C"->26)
val none : Map[String,Double] = ss.mapValues(_ => 0.0)
val mySequence = (proportionalStream(ss, none) take 100).toList
println("Desired : " + ss)
println("Achieved : " + mySequence.groupBy(identity).mapValues(_.size))
mySequence.map(s => print(s))
println

生成します:

Desired : Map(A -> 42.0, B -> 32.0, C -> 26.0)
Achieved : Map(C -> 26, A -> 42, B -> 32)
ABCABCABACBACABACBABACABCABACBACABABCABACABCABACBA
CABABCABACBACABACBABACABCABACBACABABCABACABCABACBA

score 3 · Accepted Answer

決定論的アプローチの場合、最も明白な解決策はおそらく次のようになります。

これまでのシーケンス内の各項目の出現回数を追跡します。
次の項目については、意図した数と実際の数 (または必要に応じて比率) の差が最大である項目を選択しますが、意図した数 (それぞれの比率) が実際の数よりも大きい場合に限ります。
同点の場合は、アルファベット順で最も低い項目を選択するなど、任意だが決定論的な方法でそれを破ります。

このアプローチは、この方法で生成された無限シーケンスのすべてのプレフィックスに対して、所定の比率を最適に順守することを保証します。

クイック & ダーティ python の概念実証 (変数の「名前」に意味があるとは思わないでください):

import sys

p = [0.42, 0.32, 0.26]
c = [0, 0, 0]
a = ['A', 'B', 'C']
n = 0

while n < 70*5:
    n += 1
    x = 0
    s = n*p[0] - c[0]
    for i in [1, 2]:
        si = n*p[i] - c[i]
        if si > s:
            x = i
            s = si
    sys.stdout.write(a[x])
    if n % 70 == 0:
        sys.stdout.write('\n')
    c[x] += 1

生成する

ABCABCABACABACBABCAABCABACBACABACBABCABACABACBACBAABCABCABACABACBABCAB
ACABACBACABACBABCABACABACBACBAABCABCABACABACBABCAABCABACBACABACBABCABA
CABACBACBAABCABCABACABACBABCABACABACBACBAACBABCABACABACBACBAABCABCABAC
ABACBABCABACABACBACBAACBABCABACABACBACBAABCABCABACABACBABCABACABACBACB
AACBABCABACABACBACBAABCABCABACABACBABCAABCABACBACBAACBABCABACABACBACBA

score 3 · Accepted Answer

シーケンスのすべての項目について、 0 (含む) と 100 (含まない) の間で均等に分布する (疑似) 乱数rを計算します。

0 ≤ r < 42 の場合、A
42 ≤ r < (42+32) の場合、B
(42+32) ≤ r < (42+32+26)=100 の場合、C

score 1 · Accepted Answer

サブセット内の各エントリの数はマップと同じになりますが、倍率が適用されます。

倍率はn/100です。

したがって、n が 50 の場合、になります{ Ax21, Bx16, Cx13 }。

順番はお好みでランダムに。

score 0 · Accepted Answer

これは非決定論的ですが、MvG に近い値の分布を示します。それは、最初に AAA を与えることができるという問題に苦しんでいます。MvG に対する私の反対意見が見当違いであったことを証明する方法を考えると、完全を期すためにここに投稿します (そして、賛成票は期待していません)。

さて、誰かexpandが決定論的であり、MvG のメソッドを複製しない (calc関数を役に立たなくする) ことのない関数のアイデアを持っている場合、私は完全に耳を傾けます!

<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01//EN"
   "http://www.w3.org/TR/html4/strict.dtd">
<html>
<head>
<title>ErikE's answer</title>
</head>
<body>
<div id="output"></div>
<script type="text/javascript">
if (!Array.each) {
   Array.prototype.each = function(callback) {
      var i, l = this.length;
      for (i = 0; i < l; i += 1) {
         callback(i, this[i]);
      }
   };
}

if (!Array.prototype.sum) {
   Array.prototype.sum = function() {
      var sum = 0;
      this.each(function(i, val) {
         sum += val;
      });
      return sum;
   };
}

function expand(counts) {
   var
      result = "",
      charlist = [],
      l,
      index;
   counts.each(function(i, val) {
      char = String.fromCharCode(i + 65);
      for ( ; val > 0; val -= 1) {
         charlist.push(char);
      }
   });
   l = charlist.length;
   for ( ; l > 0; l -= 1) {
      index = Math.floor(Math.random() * l);
      result += charlist[index];
      charlist.splice(index, 1);
   }
   return result;
}

function calc(n, proportions) {
   var percents = [],
      counts = [],
      errors = [],
      fnmap = [],
      errorSum,
      worstIndex;

   fnmap[1] = "min";
   fnmap[-1] = "max";

   proportions.each(function(i, val) {
      percents[i] = val / proportions.sum() * n;
      counts[i] = Math.round(percents[i]);
      errors[i] = counts[i] - percents[i];
   });

   errorSum = counts.sum() - n;
   while (errorSum != 0) {
      adjust = errorSum < 0 ? 1 : -1;
      worstIndex = errors.indexOf(Math[fnmap[adjust]].apply(0, errors));
      counts[worstIndex] += adjust;
      errors[worstIndex] = counts[worstIndex] - percents[worstIndex];
      errorSum += adjust;
   }
   return expand(counts);
}

document.body.onload = function() {
   document.getElementById('output').innerHTML = calc(99, [25.1, 24.9, 25.9, 24.1]);
};
</script>
</body>
</html>

score 0 · Accepted Answer

[各カテゴリの #elements に関して] 最も単純な「決定論的」ソリューション [IMO] は次のようになります:定義済みの順序で要素を追加し、結果のリストをシャッフルします。

最初に、map(x)/100 * n各要素から要素を追加します x は、整数演算の処理方法を選択して、1 つの要素によるオフを回避し、結果のリストをシャッフルします。

リストのシャッフルは、ほとんどの言語で実装されているfisher-yates shuffleを使用すると簡単です。たとえば、Java hasCollections.shuffle()や C++ has などです。random_shuffle()

Java では、次のように単純になります。

int N = 107;
List<String> res = new ArrayList<String>();
for (Entry<String,Integer> e : map.entrySet()) { //map is predefined Map<String,Integer> for frequencies
    for (int i = 0; i < Math.round(e.getValue()/100.0 * N); i++) {
        res.add(e.getKey());
    }
}
Collections.shuffle(res);

algorithm - 指定されたパーセンテージに比例するシーケンスを生成するアルゴリズム

5 に答える 5

Related

Reference