python - scipy.linalg.norm は sklearn.preprocessing.normalize とは異なりますか?

Question

from numpy.random import rand
from sklearn.preprocessing import normalize
from scipy.sparse import csr_matrix
from scipy.linalg import norm

w = (rand(1,10)<0.25)*rand(1,10)
x = (rand(1,10)<0.25)*rand(1,10)
w_csr = csr_matrix(w)
x_csr = csr_matrix(x)
(normalize(w_csr,axis=1,copy=False,norm='l2')*normalize(x_csr,axis=1,copy=False,norm='l2')).todense()

norm(w,ord='fro')*norm(x,ord='fro')

私は scipy csr_matrix を使用しており、フロベニウスノルムを使用して 2 つの行列を正規化し、それらの積を取得したいと考えています。しかし、scipy.linalg からのノルムと sklearn.preprocessing からの正規化は、行列の評価が異なるようです。技術的には、上記の 2 つのケースでは同じフロベニウスノルムを計算しているので、2 つの式は同じものに評価されるべきではありませんか? しかし、私は次の答えを得ます：

行列([[ 0.962341]])

0.4431811178371029

それぞれ sklearn.preprocessing と scipy.linalg.norm の場合。私は自分が間違っていることを知ることに本当に興味があります。

score 1 · Accepted Answer

sklearn.prepocessing.normalize 各行をそのノルムで除算します。入力と同じ形状の行列を返します。 scipy.linalg.norm行列のノルムを返します。したがって、あなたの計算は同等ではありません。

あなたのコードは書かれているとおりに正しくないことに注意してください。この行

(normalize(w_csr,axis=1,copy=False,norm='l2')*normalize(x_csr,axis=1,copy=False,norm='l2')).todense()

上げValueError: dimension mismatchます。両方の 2 つの呼び出しはnormalize、形状 (1, 10) の行列を返すため、それらの次元は行列積に対して互換性がありません。を得るために何をしましたmatrix([[ 0.962341]])か?

以下は、疎 (CSR や CSC など) 行列のフロベニウスノルムを計算する単純な関数です。

def spnorm(a):
    return np.sqrt(((a.data**2).sum()))

例えば、

In [182]: b_csr
Out[182]: 
<3x5 sparse matrix of type '<type 'numpy.float64'>'
with 5 stored elements in Compressed Sparse Row format>

In [183]: b_csr.A
Out[183]: 
array([[ 1.,  0.,  0.,  0.,  0.],
       [ 0.,  2.,  0.,  4.,  0.],
       [ 0.,  0.,  0.,  2.,  1.]])

In [184]: spnorm(b_csr)
Out[184]: 5.0990195135927845

In [185]: norm(b_csr.A)
Out[185]: 5.0990195135927845

python - scipy.linalg.norm は sklearn.preprocessing.normalize とは異なりますか?

1 に答える 1

Related

Reference