machine-learning - WEKA API を使用して、トレーニングとテストセットで LSA を実行する

Question

テキスト分類を行うには、Weka とその AttributeSelection アルゴリズム LatentSemanticAnalysis を使用する必要があります。データセットを LSA を適用するトレーニングセットとテストセットに分割しています。LSA に関するいくつかの投稿を読みましたが、それを使用してデータセットを分離し、互換性を維持する方法がわかりませんでした。これは私がこれまでに持っているものですが、メモリが不足しています...:

AttributeSelection selecter = new AttributeSelection();
weka.attributeSelection.LatentSemanticAnalysis lsa = new weka.attributeSelection.LatentSemanticAnalysis();
Ranker rank = new Ranker();

selecter.setEvaluator(lsa);
selecter.setSearch(rank);
selecter.setRanking(true);

selecter.SelectAttributes(input);
Instances outputData = selecter.reduceDimensionality(input);

Edit1 @Jose の返信に応えて、ソースコードの新しいバージョンを追加しました。これにより、OutOfMemoryError が発生します。

AttributeSelection filter = new AttributeSelection(); // package weka.filters.supervised.attribute!
LatentSemanticAnalysis lsa = new LatentSemanticAnalysis();
Ranker rank = new Ranker();
filter.setEvaluator(lsa);
filter.setSearch(rank);
filter.setInputFormat(train);

train = Filter.useFilter(train, filter);
test = Filter.useFilter(test, filter);

Edit2 私が得ているエラー:

Exception in thread "main" java.lang.OutOfMemoryError: Java heap space
at weka.core.matrix.Matrix.getArrayCopy(Matrix.java:301)
at weka.core.matrix.SingularValueDecomposition.<init>(SingularValueDecomposition.java:76)
at weka.core.matrix.Matrix.svd(Matrix.java:913)
at weka.attributeSelection.LatentSemanticAnalysis.buildAttributeConstructor(LatentSemanticAnalysis.java:511)
at weka.attributeSelection.LatentSemanticAnalysis.buildEvaluator(LatentSemanticAnalysis.java:416)
at weka.attributeSelection.AttributeSelection.SelectAttributes(AttributeSelection.java:596)
at weka.filters.supervised.attribute.AttributeSelection.batchFinished(AttributeSelection.java:455)
at weka.filters.Filter.useFilter(Filter.java:682)
at test.main(test.java:44)

machine-learning - WEKA API を使用して、トレーニングとテスト セットで LSA を実行する

1 に答える 1

Related

Reference

machine-learning - WEKA API を使用して、トレーニングとテストセットで LSA を実行する