16

matplotlib で生成している大規模な散布図 (〜 100,000 ポイント) があります。各ポイントにはこの x/y 空間内の位置があり、ポイントの総数の特定のパーセンタイルを含む等高線を生成したいと考えています。

これを行うmatplotlibに関数はありますか? 私はcontour()を調べましたが、このように機能するには独自の関数を作成する必要があります。

ありがとう!

4

2 に答える 2

52

Basically, you're wanting a density estimate of some sort. There multiple ways to do this:

  1. Use a 2D histogram of some sort (e.g. matplotlib.pyplot.hist2d or matplotlib.pyplot.hexbin) (You could also display the results as contours--just use numpy.histogram2d and then contour the resulting array.)

  2. Make a kernel-density estimate (KDE) and contour the results. A KDE is essentially a smoothed histogram. Instead of a point falling into a particular bin, it adds a weight to surrounding bins (usually in the shape of a gaussian "bell curve").

Using a 2D histogram is simple and easy to understand, but fundementally gives "blocky" results.

There are some wrinkles to doing the second one "correctly" (i.e. there's no one correct way). I won't go into the details here, but if you want to interpret the results statistically, you need to read up on it (particularly the bandwidth selection).

At any rate, here's an example of the differences. I'm going to plot each one similarly, so I won't use contours, but you could just as easily plot the 2D histogram or gaussian KDE using a contour plot:

import numpy as np
import matplotlib.pyplot as plt
from scipy.stats import kde

np.random.seed(1977)

# Generate 200 correlated x,y points
data = np.random.multivariate_normal([0, 0], [[1, 0.5], [0.5, 3]], 200)
x, y = data.T

nbins = 20

fig, axes = plt.subplots(ncols=2, nrows=2, sharex=True, sharey=True)

axes[0, 0].set_title('Scatterplot')
axes[0, 0].plot(x, y, 'ko')

axes[0, 1].set_title('Hexbin plot')
axes[0, 1].hexbin(x, y, gridsize=nbins)

axes[1, 0].set_title('2D Histogram')
axes[1, 0].hist2d(x, y, bins=nbins)

# Evaluate a gaussian kde on a regular grid of nbins x nbins over data extents
k = kde.gaussian_kde(data.T)
xi, yi = np.mgrid[x.min():x.max():nbins*1j, y.min():y.max():nbins*1j]
zi = k(np.vstack([xi.flatten(), yi.flatten()]))

axes[1, 1].set_title('Gaussian KDE')
axes[1, 1].pcolormesh(xi, yi, zi.reshape(xi.shape))

fig.tight_layout()
plt.show()

enter image description here

One caveat: With very large numbers of points, scipy.stats.gaussian_kde will become very slow. It's fairly easy to speed it up by making an approximation--just take the 2D histogram and blur it with a guassian filter of the right radius and covariance. I can give an example if you'd like.

One other caveat: If you're doing this in a non-cartesian coordinate system, none of these methods apply! Getting density estimates on a spherical shell is a bit more complicated.

于 2013-10-15T21:16:53.377 に答える
2

同じ質問があります。ポイントの一部を含む等高線をプロットする場合は、次のアルゴリズムを使用できます。

2D ヒストグラムを作成する

h2, xedges, yedges = np.histogram2d(X, Y, bibs = [30, 30])

h2 は、いくつかの長方形の点の数である整数を含む 2 次元行列になりました

hravel = np.sort(np.ravel(h2))[-1] #all possible cases for rectangles 
hcumsum = np.sumsum(hravel)

醜いハック、

h2 2d マトリックスのすべての点について、現在分析している点数以上の点数を含む長方形の累積点数を与えます。

hunique = np.unique(hravel)

hsum = np.sum(h2)

for h in hunique:
    h2[h2 == h] = hcumsum[np.argwhere(hravel == h)[-1]]/hsum

h2の等高線をプロットすると、すべての点をある程度含む等高線になります

于 2015-01-28T10:01:39.917 に答える