python - matlab の分位数に相当する python コマンド

Question

Python でいくつかの Matlab コードを複製しようとしています。Matlab 関数に完全に相当するものは見つかりませんでしたquantile。私が最も近いと思ったのは python のmquantiles.

マトラブの例:

 quantile( [ 8.60789925e-05, 1.98989354e-05 , 1.68308882e-04, 1.69379370e-04],  0.8)

...与えます:0.00016958

Python での同じ例:

scipy.stats.mstats.mquantiles( [8.60789925e-05, 1.98989354e-05, 1.68308882e-04, 1.69379370e-04], 0.8)

...与える0.00016912

quantileMatlabの機能を正確に複製する方法を知っている人はいますか?

score 5 · Accepted Answer

(詳細情報 => アルゴリズムセクションの下)のドキュメントには、使用されている正確なアルゴリズムが記載されています。quantileこれは、ボトルネックを使用して部分的な並べ替えを行う、フラットな配列の単一の分位点に対してそれを行う Python コードです。

import numpy as np
import botteleneck as bn

def quantile(a, prob):
    """
    Estimates the prob'th quantile of the values in a data array.

    Uses the algorithm of matlab's quantile(), namely:
        - Remove any nan values
        - Take the sorted data as the (.5/n), (1.5/n), ..., (1-.5/n) quantiles.
        - Use linear interpolation for values between (.5/n) and (1 - .5/n).
        - Use the minimum or maximum for quantiles outside that range.

    See also: scipy.stats.mstats.mquantiles
    """
    a = np.asanyarray(a)
    a = a[np.logical_not(np.isnan(a))].ravel()
    n = a.size

    if prob >= 1 - .5/n:
        return a.max()
    elif prob <= .5 / n:
        return a.min()

    # find the two bounds we're interpreting between:
    # that is, find i such that (i+.5) / n <= prob <= (i+1.5)/n
    t = n * prob - .5
    i = np.floor(t)

    # partial sort so that the ith element is at position i, with bigger ones
    # to the right and smaller to the left
    a = bn.partsort(a, i)

    if i == t: # did we luck out and get an integer index?
        return a[i]
    else:
        # we'll linearly interpolate between this and the next index
        smaller = a[i]
        larger = a[i+1:].min()
        if np.isinf(smaller):
            return smaller # avoid inf - inf
        return smaller + (larger - smaller) * (t - i)

必要なのはそれだけなので、単一分位数の 1 次元のケースのみを実行しました。いくつかの分位数が必要な場合は、おそらく完全な並べ替えを行うだけの価値があります。軸ごとにそれを行うには、nans がないことを知っていたので、軸引数を並べ替えに追加し、線形補間ビットをベクトル化するだけです。nans を使用して軸ごとに行うと、少しトリッキーになります。

このコードは次のようになります。

>>> quantile([ 8.60789925e-05, 1.98989354e-05 , 1.68308882e-04, 1.69379370e-04], 0.8)
0.00016905822360000001

そしてmatlabコードが与えた0.00016905822359999999; 違いは3e-20です。(これは機械の精度よりも低い)

score 4 · Accepted Answer

入力ベクトルには 4 つの値しかありません。これは、基礎となる分布の分位数の適切な近似値を取得するには少なすぎます。この不一致は、おそらく、Matlab と SciPy が異なるヒューリスティックを使用して、サンプリングされた分布の下で分位点を計算した結果です。

score 3 · Accepted Answer

少し遅れましたが：

mquantiles は非常に柔軟です。alphap および betap パラメーターを指定するだけです。ここでは、MATLAB が線形補間を行うため、パラメーターを (0.5,0.5) に設定する必要があります。

In [9]: scipy.stats.mstats.mquantiles( [8.60789925e-05, 1.98989354e-05, 1.68308882e-04, 1.69379370e-04], 0.8, alphap=0.5, betap=0.5)

編集: MATLAB は線形補間を行うと言っていますが、 Rの Type 5 分位数と scipy の (0.5, 0.5) に相当する区分線形補間によって分位数を計算しているようです。

python - matlab の分位数に相当する python コマンド

3 に答える 3

Related

Reference