python - 多次元信頼区間

Question

私は多数のタプル (par1,par2) を持っています。つまり、実験を複数回繰り返すことで得られた 2 次元パラメーター空間のポイントです。

信頼楕円を計算して視覚化する可能性を探しています (これが正しい用語かどうかはわかりません)。ここに、私が何を意味するかを示すために Web で見つけたプロットの例を示します。

ここに画像の説明を入力

ソース: blogspot.ch/2011/07/classification-and-discrimination-with.html

したがって、原則として、多変量正規分布をデータポイントの2Dヒストグラムに適合させる必要があります。誰かがこれで私を助けることができますか?

score 37 · Accepted Answer

ポイントの分散の2シグマ楕円が必要なように聞こえますか？

もしそうなら、次のようなことを検討してください（ここにある論文のコードから：https ：//github.com/joferkington/oost_paper_code/blob/master/error_ellipse.py ）：

import numpy as np

import matplotlib.pyplot as plt
from matplotlib.patches import Ellipse

def plot_point_cov(points, nstd=2, ax=None, **kwargs):
    """
    Plots an `nstd` sigma ellipse based on the mean and covariance of a point
    "cloud" (points, an Nx2 array).

    Parameters
    ----------
        points : An Nx2 array of the data points.
        nstd : The radius of the ellipse in numbers of standard deviations.
            Defaults to 2 standard deviations.
        ax : The axis that the ellipse will be plotted on. Defaults to the 
            current axis.
        Additional keyword arguments are pass on to the ellipse patch.

    Returns
    -------
        A matplotlib ellipse artist
    """
    pos = points.mean(axis=0)
    cov = np.cov(points, rowvar=False)
    return plot_cov_ellipse(cov, pos, nstd, ax, **kwargs)

def plot_cov_ellipse(cov, pos, nstd=2, ax=None, **kwargs):
    """
    Plots an `nstd` sigma error ellipse based on the specified covariance
    matrix (`cov`). Additional keyword arguments are passed on to the 
    ellipse patch artist.

    Parameters
    ----------
        cov : The 2x2 covariance matrix to base the ellipse on
        pos : The location of the center of the ellipse. Expects a 2-element
            sequence of [x0, y0].
        nstd : The radius of the ellipse in numbers of standard deviations.
            Defaults to 2 standard deviations.
        ax : The axis that the ellipse will be plotted on. Defaults to the 
            current axis.
        Additional keyword arguments are pass on to the ellipse patch.

    Returns
    -------
        A matplotlib ellipse artist
    """
    def eigsorted(cov):
        vals, vecs = np.linalg.eigh(cov)
        order = vals.argsort()[::-1]
        return vals[order], vecs[:,order]

    if ax is None:
        ax = plt.gca()

    vals, vecs = eigsorted(cov)
    theta = np.degrees(np.arctan2(*vecs[:,0][::-1]))

    # Width and height are "full" widths, not radius
    width, height = 2 * nstd * np.sqrt(vals)
    ellip = Ellipse(xy=pos, width=width, height=height, angle=theta, **kwargs)

    ax.add_artist(ellip)
    return ellip

if __name__ == '__main__':
    #-- Example usage -----------------------
    # Generate some random, correlated data
    points = np.random.multivariate_normal(
            mean=(1,1), cov=[[0.4, 9],[9, 10]], size=1000
            )
    # Plot the raw points...
    x, y = points.T
    plt.plot(x, y, 'ro')

    # Plot a transparent 3 standard deviation covariance ellipse
    plot_point_cov(points, nstd=3, alpha=0.5, color='green')

    plt.show()

ここに画像の説明を入力してください

score 4 · Accepted Answer

エラーまたは信頼領域の等高線をプロットする上記の例の 1 つを少し変更しました。今では、正しい輪郭が得られると思います。

各データセットに個別に適用する必要があるときに、scoreatpercentile メソッドをジョイントデータセット (青 + 赤の点) に適用していたため、間違った等高線が表示されていました。

変更されたコードは以下にあります。

import numpy
import scipy
import scipy.stats
import matplotlib.pyplot as plt

# generate two normally distributed 2d arrays
x1=numpy.random.multivariate_normal((100,420),[[120,80],[80,80]],400)
x2=numpy.random.multivariate_normal((140,340),[[90,-70],[-70,80]],400)

# fit a KDE to the data
pdf1=scipy.stats.kde.gaussian_kde(x1.T)
pdf2=scipy.stats.kde.gaussian_kde(x2.T)

# create a grid over which we can evaluate pdf
q,w=numpy.meshgrid(range(50,200,10), range(300,500,10))
r1=pdf1([q.flatten(),w.flatten()])
r2=pdf2([q.flatten(),w.flatten()])

# sample the pdf and find the value at the 95th percentile
s1=scipy.stats.scoreatpercentile(pdf1(pdf1.resample(1000)), 5)
s2=scipy.stats.scoreatpercentile(pdf2(pdf2.resample(1000)), 5)

# reshape back to 2d
r1.shape=(20,15)
r2.shape=(20,15)

# plot the contour at the 95th percentile
plt.contour(range(50,200,10), range(300,500,10), r1, [s1],colors='b')
plt.contour(range(50,200,10), range(300,500,10), r2, [s2],colors='r')

# scatter plot the two normal distributions
plt.scatter(x1[:,0],x1[:,1],alpha=0.3)
plt.scatter(x2[:,0],x2[:,1],c='r',alpha=0.3)

score 0 · Accepted Answer

あなたが探しているのは、信頼区間を計算することだと思います。

それについてはよくわかりませんが、出発点として、sherpaアプリケーションでPythonをチェックします。少なくとも、Scipy 2011の講演で、著者は、それを使用して信頼領域を決定および取得できると述べています（ただし、データのモデルが必要になる場合があります）。

シェルパトークのビデオと対応するスライドをご覧ください。

HTH

python - 多次元信頼区間

4 に答える 4

Related

Reference