python - Pythonでの確率配列の離散化

Question

次の例のように、種の発生の確率値を含むnumpy配列（実際にはGISラスターマップからインポートされたもの）があります。

a = random.randint(1.0,20.0,1200).reshape(40,30)
b = (a*1.0)/sum(a)

ここで、そのアレイの個別バージョンを再度取得したいと思います。たとえば、そのアレイ（1200セル）の領域に100人の個人がいる場合、それらはどのように分散されますか？もちろん、それらは確率に従って分布する必要があります。つまり、値が低いほど発生の可能性が低いことを示します。ただし、すべてが統計であるため、個人が低確率のセルにいる可能性があります。複数の個人がセルを占有できる可能性があります...

これは、連続分布曲線を再びヒストグラムに変換するようなものです。多くの異なるヒストグラムが特定の分布曲線をもたらす可能性があるように、それも逆である必要があります。したがって、私が探しているアルゴリズムを適用すると、毎回異なる離散値が生成されます。

...それを行うことができるPythonのアルゴリズムはありますか？私は離散化にあまり詳しくないので、誰かが助けてくれるかもしれません。

score 3 · Accepted Answer

random.choiceと一緒に使用bincount：

np.bincount(np.random.choice(b.size, 100, p=b.flat),
            minlength=b.size).reshape(b.shape)

NumPy 1.7をお持ちでない場合は、次のように置き換えることができますrandom.choice。

np.searchsorted(np.cumsum(b), np.random.random(100))

与える：

np.bincount(np.searchsorted(np.cumsum(b), np.random.random(100)),
            minlength=b.size).reshape(b.shape)

score 2 · Accepted Answer

これまでのところ、ecatmurの答えは非常に合理的で単純なように思われます。

もっと「応用された」例を追加したいだけです。6つの面（6つの数字）を持つサイコロを考えます。各数値/結果の確率は1/6です。サイコロを配列の形で表示すると、次のようになります。

b = np.array([[1,1,1],[1,1,1]])/6.0

したがって、サイコロを100回振る(n=100)と、次のシミュレーションが行われます。

np.bincount(np.searchsorted(np.cumsum(b), np.random.random(n)),minlength=b.size).reshape(b.shape)

それはそのようなアプリケーションにとって適切なアプローチになると思います。したがって、あなたの助けをecatmurに感謝します！

/ヨハネス

score 1 · Accepted Answer

これは私が今月初めに持っていた私の質問に似ています。

import random
def RandFloats(Size):
    Scalar = 1.0
    VectorSize = Size
    RandomVector = [random.random() for i in range(VectorSize)]
    RandomVectorSum = sum(RandomVector)
    RandomVector = [Scalar*i/RandomVectorSum for i in RandomVector]
    return RandomVector

from numpy.random import multinomial
import math
def RandIntVec(ListSize, ListSumValue, Distribution='Normal'):
    """
    Inputs:
    ListSize = the size of the list to return
    ListSumValue = The sum of list values
    Distribution = can be 'uniform' for uniform distribution, 'normal' for a normal distribution ~ N(0,1) with +/- 5 sigma  (default), or a list of size 'ListSize' or 'ListSize - 1' for an empirical (arbitrary) distribution. Probabilities of each of the p different outcomes. These should sum to 1 (however, the last element is always assumed to account for the remaining probability, as long as sum(pvals[:-1]) <= 1).  
    Output:
    A list of random integers of length 'ListSize' whose sum is 'ListSumValue'.
    """
    if type(Distribution) == list:
        DistributionSize = len(Distribution)
        if ListSize == DistributionSize or (ListSize-1) == DistributionSize:
            Values = multinomial(ListSumValue,Distribution,size=1)
            OutputValue = Values[0]
    elif Distribution.lower() == 'uniform': #I do not recommend this!!!! I see that it is not as random (at least on my computer) as I had hoped
        UniformDistro = [1/ListSize for i in range(ListSize)]
        Values = multinomial(ListSumValue,UniformDistro,size=1)
        OutputValue = Values[0]
    elif Distribution.lower() == 'normal':
        """
            Normal Distribution Construction....It's very flexible and hideous
            Assume a +-3 sigma range.  Warning, this may or may not be a suitable range for your implementation!
            If one wishes to explore a different range, then changes the LowSigma and HighSigma values
            """
            LowSigma    = -3#-3 sigma
            HighSigma   = 3#+3 sigma
            StepSize    = 1/(float(ListSize) - 1)
            ZValues     = [(LowSigma * (1-i*StepSize) +(i*StepSize)*HighSigma) for i in range(int(ListSize))]
            #Construction parameters for N(Mean,Variance) - Default is N(0,1)
            Mean        = 0
            Var         = 1
            #NormalDistro= [self.NormalDistributionFunction(Mean, Var, x) for x in ZValues]
            NormalDistro= list()
            for i in range(len(ZValues)):
                if i==0:
                    ERFCVAL = 0.5 * math.erfc(-ZValues[i]/math.sqrt(2))
                    NormalDistro.append(ERFCVAL)
                elif i ==  len(ZValues) - 1:
                    ERFCVAL = NormalDistro[0]
                    NormalDistro.append(ERFCVAL)
                else:
                    ERFCVAL1 = 0.5 * math.erfc(-ZValues[i]/math.sqrt(2))
                    ERFCVAL2 = 0.5 * math.erfc(-ZValues[i-1]/math.sqrt(2))
                    ERFCVAL = ERFCVAL1 - ERFCVAL2
                    NormalDistro.append(ERFCVAL)  
            #print "Normal Distribution sum = %f"%sum(NormalDistro)
            Values = multinomial(ListSumValue,NormalDistro,size=1)
            OutputValue = Values[0]
        else:
            raise ValueError ('Cannot create desired vector')
        return OutputValue
    else:
        raise ValueError ('Cannot create desired vector')
    return OutputValue

ProbabilityDistibution = RandFloats(1200)#This is your probability distribution for your 1200 cell array
SizeDistribution = RandIntVec(1200,100,Distribution=ProbabilityDistribution)#for a 1200 cell array, whose sum is 100 with given probability distribution

重要な2つの主要な行は、上記のコードの最後の2行です。

python - Pythonでの確率配列の離散化

3 に答える 3

Related

Reference