python - PythonのパンダにRスケール関数を実装しますか?

Question

scaleパンダのRの機能に相当する効率的なものは何ですか? 例えば

newdf <- scale(df)

パンダで書かれた？を使用するエレガントな方法はありtransformますか?

score 8 · Accepted Answer

私はRを知りませんが、ドキュメントを読むと、次のように見えます（少し一般的ではありませんが）

def scale(y, c=True, sc=True):
    x = y.copy()

    if c:
        x -= x.mean()
    if sc and c:
        x /= x.std()
    elif sc:
        x /= np.sqrt(x.pow(2).sum().div(x.count() - 1))
    return x

より一般的なバージョンでは、おそらくタイプ/長さのチェックを行う必要があります。

EDIT：elif sc:句の分母の説明を追加

R ドキュメントから:

 ... If ‘scale’ is
 ‘TRUE’ then scaling is done by dividing the (centered) columns of
 ‘x’ by their standard deviations if ‘center’ is ‘TRUE’, and the
 root mean square otherwise.  If ‘scale’ is ‘FALSE’, no scaling is
 done.

 The root-mean-square for a (possibly centered) column is defined
 as sqrt(sum(x^2)/(n-1)), where x is a vector of the non-missing
 values and n is the number of non-missing values.  In the case
 ‘center = TRUE’, this is the same as the standard deviation, but
 in general it is not.

この行np.sqrt(x.pow(2).sum().div(x.count() - 1))は、定義を使用して二乗平均平方根を計算しx(powメソッド)、次に行に沿って合計し、次にNaN各列の非カウントで除算します (countメソッド)。

補足として、センタリング後に単純にRMSを計算しなかった理由は、より一般的なRMSではなく標準偏差を計算したいという特別な場合に、メソッドがその式のより高速な計算を必要とするためですstd。bottleneck

代わりに、センタリング後に RMS を計算することもできます。ベンチマークの価値があるかもしれません。これを書いている今、どちらが速いかは実際にはわかりませんし、ベンチマークも行っていません。

python - PythonのパンダにRスケール関数を実装しますか?

2 に答える 2

Related

Reference