machine-learning - 回帰のスコアリング関数としてのカイ二乗

Question

これは、http://scikit-learn.org/0.9/modules/feature_selection.htmlに記載されています。「分類の問題で回帰スコア関数を使用しないように注意してください。」

回帰問題に最適な機能を見つけようとしており、スコアリング関数として f_regression を使用しています。しかし、それは非常にメモリを消費し、私の 8GB マシンがハングし、最終的にメモリエラーが発生します。

同じ問題のスコア関数として Chi2 を使用しましたが、非常に高速に動作します。警告の逆が真かどうか知りたいですか? そうでない場合、回帰問題のスコアリング関数としてChi2を使用できますか?

score 1 · Accepted Answer

No you should not use Chi2 scoring function as it has no proved guarantee to be accurate for regression model. You have to check your f_regression solution or use other solution like recursive elimination or PCA(Principle Component Analysis)

http://en.wikipedia.org/wiki/Principal_component_analysis

I personally would advice PCA, it gives very robust results.

score 1 · Accepted Answer

The χ² test builds a contingency table of n_classes times n_features. In a regression model, there is no notion of n_classes. The only way to make it work would be to bin your y values, do feature selection, then train a regression model on the original y and the reduced feature set. There is no support for this in scikit-learn, so you'll have to program it yourself.

machine-learning - 回帰のスコアリング関数としてのカイ二乗

3 に答える 3

Related

Reference