I don't think that using Mann-Whitney U test is a good way to do feature selection. Mann-Whitney tests whether distributions of the two variable are the same, it tells you nothing about how correlated the variables are. For example:
>>> from scipy.stats import mannwhitneyu
>>> a = np.arange(100)
>>> b = np.arange(100)
>>> np.random.shuffle(b)
>>> np.corrcoef(a,b)
array([[ 1. , -0.07155116],
[-0.07155116, 1. ]])
>>> mannwhitneyu(a, b)
(5000.0, 0.49951259627554112) # result for almost not correlated
>>> mannwhitneyu(a, a)
(5000.0, 0.49951259627554112) # result for perfectly correlated
Because a
and b
have the same distributions we fail to reject the null hypothesis that the distributions are identical.
And since in features selection you are trying find features that mostly explain Y
, Mann-Whitney U does not help you with that.