python - パラメータに関係なくscipy.interpolate.UnivariateSplineが平滑化されない

Question

補間時に scipy.interpolate.UnivariateSpline でスムージングを使用するのに問題があります。関数のページと以前の投稿に基づいて、パラメーターで平滑化を提供する必要があると思いsます。

これが私のコードです：

# Imports
import scipy
import pylab

# Set up and plot actual data
x = [0, 5024.2059124920379, 7933.1645067836089, 7990.4664106277542, 9879.9717114947653, 13738.60563208926, 15113.277958924193]
y = [0.0, 3072.5653360000988, 5477.2689107965398, 5851.6866463790966, 6056.3852496014106, 7895.2332350173638, 9154.2956175610598]
pylab.plot(x, y, "o", label="Actual")

# Plot estimates using splines with a range of degrees
for k in range(1, 4):
    mySpline = scipy.interpolate.UnivariateSpline(x=x, y=y, k=k, s=2)
    xi = range(0, 15100, 20)
    yi = mySpline(xi)
    pylab.plot(xi, yi, label="Predicted k=%d" % k)

# Show the plot
pylab.grid(True)
pylab.xticks(rotation=45)
pylab.legend( loc="lower right" )
pylab.show()

結果は次のとおりです。

スムージングなしのスプライン

s値の範囲 (0.01、0.1、1、2、5、50) と明示的な重みを同じもの (1.0) に設定するかランダム化して、これを試しました。私はまだ平滑化を行うことができず、ノットの数は常にデータポイントの数と同じです。特に、平滑化される 4 番目のポイント (7990.4664106277542、5851.6866463790966) のような外れ値を探しています。

十分なデータがないからですか？もしそうなら、このいくつかのデータポイントで平滑化を達成するために適用できる同様のスプライン関数またはクラスター手法はありますか?

score 11 · Accepted Answer

s簡単な答え: の値をより慎重に選択する必要があります。

UnivariateSplineのドキュメントには、次のように記載されています。

Positive smoothing factor used to choose the number of knots. Number of 
knots will be increased until the     smoothing condition is satisfied:
sum((w[i]*(y[i]-s(x[i])))**2,axis=0) <= s

これから、明示的な重みを渡さない場合、平滑化の「妥当な」値は、データポイントの数とデータの分散がs = m * vどこにあるかを推測できます。この場合、.mvs_good ~ 5e7

EDIT : 適切な値sはもちろん、データのノイズレベルにも依存します。ドキュメントは、スムーズにしたい「ノイズ」に関連する標準偏差sの範囲(m - sqrt(2*m)) * std**2 <= s <= (m + sqrt(2*m)) * std**2で選択することを推奨しているようです。std

score 1 · Accepted Answer

私はあなたのためにそれを行うライブラリを知りませんが、もう少しDIYのアプローチを試してみます.xとy. あなたの特定の例では、4 番目と 5 番目のポイントの間に単一の結び目を持っていると、約で巨大な導関数が削除されるため、うまくいくはずですx=8000。

score 0 · Accepted Answer

BigChef の回答を実行するのに問題がありました。Python 3.6 で動作するバリエーションを次に示します。

# Imports
import pylab
import scipy
import sklearn.cluster

# Set up original data - note that it's monotonically increasing by X value!
data = {}
data['original'] = {}
data['original']['x'] = [0, 5024.2059124920379, 7933.1645067836089, 7990.4664106277542, 9879.9717114947653, 13738.60563208926, 15113.277958924193]
data['original']['y'] = [0.0, 3072.5653360000988, 5477.2689107965398, 5851.6866463790966, 6056.3852496014106, 7895.2332350173638, 9154.2956175610598]

# Cluster data, sort it and and save
import numpy
inputNumpy = numpy.array([[data['original']['x'][i], data['original']['y'][i]] for i in range(0, len(data['original']['x']))])
meanShift = sklearn.cluster.MeanShift()
meanShift.fit(inputNumpy)
clusteredData = [[pair[0], pair[1]] for pair in meanShift.cluster_centers_]

clusteredData.sort(key=lambda li: li[0])
data['clustered'] = {}
data['clustered']['x'] = [pair[0] for pair in clusteredData]
data['clustered']['y'] = [pair[1] for pair in clusteredData]

# Build a spline using the clustered data and predict
mySpline = scipy.interpolate.UnivariateSpline(x=data['clustered']['x'], y=data['clustered']['y'], k=1)
xi = range(0, int(round(max(data['original']['x']), -3)) + 3000, 20)
yi = mySpline(xi)

# Plot the datapoints
pylab.plot(data['clustered']['x'], data['clustered']['y'], "D", label="Datapoints (%s)" % 'clustered')
pylab.plot(xi, yi, label="Predicted (%s)" %  'clustered')
pylab.plot(data['original']['x'], data['original']['y'], "o", label="Datapoints (%s)" % 'original')

# Show the plot
pylab.grid(True)
pylab.xticks(rotation=45)
pylab.show()

python - パラメータに関係なくscipy.interpolate.UnivariateSplineが平滑化されない

4 に答える 4

Related

Reference