r - データ入力の重要なサブグループを決定する

Question

10000個のサンプルとこれらのサンプルの5001個の異なる機能を表す大きな（10000 X 5001）テーブルがあります。これらの機能の1つは、各サンプルの出力変数を表します。つまり、サンプルごとに5000個の入力変数と1個の出力変数があります。

これらの入力のほとんどは無関係であることを私は知っています。したがって、私がやりたいのは、出力変数を最もよく予測する入力変数のサブセットを決定することです。Rでこれを行うための最良/最も簡単な方法は何ですか？

score 0 · Accepted Answer

主成分分析（stats :: prcomp）または線形判別分析（MASS :: lda）が必要になる場合があります。

AvrilCoghlanによるこのドキュメントを参照してください

http://little-book-of-r-for-multivariate-analysis.readthedocs.org/en/latest/

score 0 · Accepted Answer

You might want to check out Weka. In the Explorer load the data and then go to the Select attributes tab. There you will find several options to get the most informative attributes/features in your dataset.

score 0 · Accepted Answer

Rather than taking 'random' suggestions, why not go to the CRAN Task View for Cluster Analysis & Finite Mixture Models ?

r - データ入力の重要なサブグループを決定する

3 に答える 3

Related

Reference