python - ヒストグラムベースの確率密度推定

Question

このデータセットの周辺分布 p(x1 ) と p(x2 ) のそれぞれのヒストグラムベースの確率密度推定を作成するにはどうすればよいですか。

import numpy as np
import matplotlib.pyplot as plt
linalg = np.linalg

N = 100
mean = [1,1]
cov = [[0.3, 0.2],[0.2, 0.2]]
data = np.random.multivariate_normal(mean, cov, N)
L = linalg.cholesky(cov)
# print(L.shape)
# (2, 2)
uncorrelated = np.random.standard_normal((2,N))
data2 = np.dot(L,uncorrelated) + np.array(mean).reshape(2,1)
# print(data2.shape)
# (2, 1000)
plt.scatter(data2[0,:], data2[1,:], c='green')    
plt.scatter(data[:,0], data[:,1], c='yellow')
plt.show()

これには、Matlab または R の hist 関数を使用できます。ビンの幅 (または同等のビンの数) を変更すると、プロットと p(x1 ) および p(x2 ) の推定値にどのような影響がありますか?

私は Python を使用しています。Matlab の hist 関数とその実装方法に似たものはありますか?

score 1 · Accepted Answer

Matlabhist関数は matplotlib に (ご想像のとおり) として実装されていますmatplotlib.pyplot.hist。ビンの数をパラメーターとして取り、ヒストグラムをプロットします。ヒストグラムをプロットせずに計算するには、Numpy のnumpy.histogram関数を使用します。

確率分布を推定するには、の分布を使用できますscipy.stats。上記のデータは正規分布から生成されました。このデータに正規分布を当てはめるには、を使用しますscipy.stats.norm.fit。以下は、データのヒストグラムをプロットし、それに正規分布を当てはめるコード例です。

import numpy as np
import matplotlib.pyplot as plt
from scipy.stats import norm
linalg = np.linalg

N = 100
mean = [1,1]
cov = [[0.3, 0.2],[0.2, 0.2]]
data = np.random.multivariate_normal(mean, cov, N)
L = linalg.cholesky(cov)
# print(L.shape)
# (2, 2)
uncorrelated = np.random.standard_normal((2,N))
data2 = np.dot(L,uncorrelated) + np.array(mean).reshape(2,1)
# print(data2.shape)
# (2, 1000)
plt.figure()
plt.scatter(data2[0,:], data2[1,:], c='green')    
plt.scatter(data[:,0], data[:,1], c='yellow')
plt.show()

# Plotting histograms and fitting normal distributions
plt.subplot(211)
plt.hist(data[:,0], bins=20, normed=1, alpha=0.5, color='green')
plt.hist(data2[0,:], bins=20, normed=1, alpha=0.5, color='yellow')
x = np.arange(-1, 3, 0.001)
plt.plot(x, norm.pdf(x, *norm.fit(data[:,0])), color='green')
plt.plot(x, norm.pdf(x, *norm.fit(data2[0,:])), color='yellow')
plt.title('Var 1')

plt.subplot(212)
plt.hist(data[:,1], bins=20, normed=1, alpha=0.5, color='green')
plt.hist(data2[1,:], bins=20, normed=1, alpha=0.5, color='yellow')
x = np.arange(-1, 3, 0.001)
plt.plot(x, norm.pdf(x, *norm.fit(data[:,1])), color='green')
plt.plot(x, norm.pdf(x, *norm.fit(data2[1,:])), color='yellow')
plt.title('Var 2')

plt.tight_layout()

python - ヒストグラムベースの確率密度推定

1 に答える 1

Related

Reference