python - Pythonの多変量正規密度？

Question

多変量正規分布のPDF（確率密度関数）の効率的な計算を可能にするPythonパッケージはありますか？

Numpy / Scipyには含まれていないようで、驚くべきことに、Google検索では有用なものは見つかりませんでした。

score 31 · Accepted Answer

私は自分の目的のために1つ作成したので、共有します。http://en.wikipedia.org/wiki/Multivariate_normal_distributionの非縮退ケースの式で、numpy の「力」を使用して構築され、入力も検証されます。

ここにコードとサンプル実行があります

from numpy import *
import math
# covariance matrix
sigma = matrix([[2.3, 0, 0, 0],
           [0, 1.5, 0, 0],
           [0, 0, 1.7, 0],
           [0, 0,   0, 2]
          ])
# mean vector
mu = array([2,3,8,10])

# input
x = array([2.1,3.5,8, 9.5])

def norm_pdf_multivariate(x, mu, sigma):
    size = len(x)
    if size == len(mu) and (size, size) == sigma.shape:
        det = linalg.det(sigma)
        if det == 0:
            raise NameError("The covariance matrix can't be singular")

        norm_const = 1.0/ ( math.pow((2*pi),float(size)/2) * math.pow(det,1.0/2) )
        x_mu = matrix(x - mu)
        inv = sigma.I        
        result = math.pow(math.e, -0.5 * (x_mu * inv * x_mu.T))
        return norm_const * result
    else:
        raise NameError("The dimensions of the input don't match")

print norm_pdf_multivariate(x, mu, sigma)

score 17 · Accepted Answer

それでも必要な場合、私の実装は

import numpy as np

def pdf_multivariate_gauss(x, mu, cov):
    '''
    Caculate the multivariate normal density (pdf)

    Keyword arguments:
        x = numpy array of a "d x 1" sample vector
        mu = numpy array of a "d x 1" mean vector
        cov = "numpy array of a d x d" covariance matrix
    '''
    assert(mu.shape[0] > mu.shape[1]), 'mu must be a row vector'
    assert(x.shape[0] > x.shape[1]), 'x must be a row vector'
    assert(cov.shape[0] == cov.shape[1]), 'covariance matrix must be square'
    assert(mu.shape[0] == cov.shape[0]), 'cov_mat and mu_vec must have the same dimensions'
    assert(mu.shape[0] == x.shape[0]), 'mu and x must have the same dimensions'
    part1 = 1 / ( ((2* np.pi)**(len(mu)/2)) * (np.linalg.det(cov)**(1/2)) )
    part2 = (-1/2) * ((x-mu).T.dot(np.linalg.inv(cov))).dot((x-mu))
    return float(part1 * np.exp(part2))

def test_gauss_pdf():
    x = np.array([[0],[0]])
    mu  = np.array([[0],[0]])
    cov = np.eye(2) 

    print(pdf_multivariate_gauss(x, mu, cov))

    # prints 0.15915494309189535

if __name__ == '__main__':
    test_gauss_pdf()

将来変更を加える場合に備えて、コードはGitHub にあります

score 8 · Accepted Answer

scipy.stats.norm対角共分散行列の一般的なケースでは、多変量 PDF は、インスタンスによって返された単変量 PDF 値を乗算するだけで取得できます。一般的なケースが必要な場合は、おそらくこれを自分でコーディングする必要があります (難しいことではありません)。

score 3 · Accepted Answer

内部でさまざまな汎用性とさまざまな用途で使用する python パッケージをいくつか知っていますが、それらのいずれかがユーザー向けであるかどうかはわかりません。

たとえば、statsmodels には次の非表示の関数とクラスがありますが、statsmodels では使用されません。

https://github.com/statsmodels/statsmodels/blob/master/statsmodels/miscmodels/try_mlecov.py#L36

https://github.com/statsmodels/statsmodels/blob/master/statsmodels/sandbox/distributions/mv_normal.py#L777

基本的に、迅速な評価が必要な場合は、ユースケースに合わせて書き直してください。

score 2 · Accepted Answer

密度は、numpy 関数とこのページの式を使用して、非常に簡単な方法で計算できます: http://en.wikipedia.org/wiki/Multivariate_normal_distribution。尤度関数 (対数確率) を使用することもできます。これは、大きな次元でアンダーフローする可能性が低く、計算が少し簡単です。どちらも、行列式と行列の逆行列を計算できることだけを含みます。

一方、CDF はまったく別の動物です...

score 0 · Accepted Answer

次のコードは、ベクトルが与えられたときに、ベクトルが多変量正規分布にある可能性を解決するのに役立ちました。

import numpy as np
from scipy.stats import multivariate_normal

すべてのベクトルを含むデータ

d= np.array([[1,2,1],[2,1,3],[4,5,4],[2,2,1]])

入力ベクトルと同じ長さのベクトル形式のデータの平均 (ここでは 3)

mean = sum(d,axis=0)/len(d)

OR
mean=np.average(d , axis=0)
mean.shape

[入力ベクトルの形状X入力ベクトルの形状]の形状を持つベクトルの共変量を見つけるここでは3x3です

cov = 0
for e in d:
  cov += np.dot((e-mean).reshape(len(e), 1), (e-mean).reshape(1, len(e)))
cov /= len(d)
cov.shape

平均と共分散から多変量ガウス分布を準備する

dist = multivariate_normal(mean,cov)

確率分布関数を見つける。

print(dist.pdf([1,2,3]))

3.050863384798471e-05

上記の値は可能性を示します。

python - Pythonの多変量正規密度？

10 に答える 10

すべてのベクトルを含むデータ

入力ベクトルと同じ長さのベクトル形式のデータの平均 (ここでは 3)

[入力ベクトルの形状X入力ベクトルの形状]の形状を持つベクトルの共変量を見つけるここでは3x3です

平均と共分散から多変量ガウス分布を準備する

確率分布関数を見つける。

Related

Reference