4

Xが行列で、Yがクラスのベクトルであるテーブル(X、Y)があります。ここに例があります:

X = 0 0 1 0 1   and Y = 1
    0 1 0 0 0           1
    1 1 1 0 1           0

Mann-Whitney U 検定を使用して機能の重要度 (機能選択) を計算したい

from scipy.stats import mannwhitneyu
results = np.zeros((X.shape[1],2))
for i in xrange(X.shape[1]):
    u, prob = mannwhitneyu(X[:,i], Y)
    results[i,:] = u, pro

これが正しいかどうかわかりませんか?u = 990大きなテーブルのいくつかの列で大きな値を取得しました。

4

1 に答える 1

13

I don't think that using Mann-Whitney U test is a good way to do feature selection. Mann-Whitney tests whether distributions of the two variable are the same, it tells you nothing about how correlated the variables are. For example:

>>> from scipy.stats import mannwhitneyu
>>> a = np.arange(100)
>>> b = np.arange(100)
>>> np.random.shuffle(b)
>>> np.corrcoef(a,b)
   array([[ 1.        , -0.07155116],
          [-0.07155116,  1.        ]])
>>> mannwhitneyu(a, b)
(5000.0, 0.49951259627554112) # result for almost not correlated
>>> mannwhitneyu(a, a)
(5000.0, 0.49951259627554112) # result for perfectly correlated

Because a and b have the same distributions we fail to reject the null hypothesis that the distributions are identical.

And since in features selection you are trying find features that mostly explain Y, Mann-Whitney U does not help you with that.

于 2014-03-15T15:02:23.643 に答える