python - パンダの行列乗算

Question

2 つの DataFrame x と y に数値データが格納されています。numpy の内積は機能しますが、pandas の内積は機能しません。

In [63]: x.shape
Out[63]: (1062, 36)

In [64]: y.shape
Out[64]: (36, 36)

In [65]: np.inner(x, y).shape
Out[65]: (1062L, 36L)

In [66]: x.dot(y)
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-66-76c015be254b> in <module>()
----> 1 x.dot(y)

C:\Programs\WinPython-64bit-2.7.3.3\python-2.7.3.amd64\lib\site-packages\pandas\core\frame.pyc in dot(self, other)
    888             if (len(common) > len(self.columns) or
    889                     len(common) > len(other.index)):
--> 890                 raise ValueError('matrices are not aligned')
    891 
    892             left = self.reindex(columns=common, copy=False)

ValueError: matrices are not aligned

これはバグですか、それともパンダの使い方が間違っていますか?

score 40 · Accepted Answer

xとの形状yが正しいだけでなく、の列名がxのインデックス名と一致している必要がありyます。それ以外の場合、このコードはpandas/core/frame.pyValueError を発生させます:

if isinstance(other, (Series, DataFrame)):
    common = self.columns.union(other.index)
    if (len(common) > len(self.columns) or
        len(common) > len(other.index)):
        raise ValueError('matrices are not aligned')

xの列名をのインデックス名と一致させずに行列積を計算するだけの場合はy、NumPy ドット関数を使用します。

np.dot(x, y)

の列名がxのインデックス名と一致しなければならない理由yは、pandasdotメソッドがインデックスを再作成し、の列の順序とのインデックスの順序が自然に一致しない場合、行列積が実行される前に一致させるためです。 :xyxy

left = self.reindex(columns=common, copy=False)
right = other.reindex(index=common, copy=False)

NumPydot関数はそのようなことはしません。基になる配列の値に基づいて行列積を計算するだけです。

エラーを再現する例を次に示します。

import pandas as pd
import numpy as np

columns = ['col{}'.format(i) for i in range(36)]
x = pd.DataFrame(np.random.random((1062, 36)), columns=columns)
y = pd.DataFrame(np.random.random((36, 36)))

print(np.dot(x, y).shape)
# (1062, 36)

print(x.dot(y).shape)
# ValueError: matrices are not aligned

python - パンダの行列乗算

1 に答える 1

Related

Reference