search - luceneインデックスで単語の頻度を数える

Question

誰かがすべてのluceneインデックスで単語の頻度を見つけるのを手伝ってもらえますか？
たとえば、ドキュメントAに3つの単語（B）があり、ドキュメントCに2つの単語がある場合、単語（B）の頻度を示す5を返すメソッドが必要です。すべてのluceneインデックスで

score 9 · Accepted Answer

これは何度も尋ねられました：

score 3 · Accepted Answer

Lucene 3.x を使用していると仮定します。

IndexReader ir = IndexReader.open(dir); 
TermDocs termDocs = ir.termDocs(new Term("your_field", "your_word"));
int count = 0;
while (termDocs.next()) {
   count += termDocs.freq();
}

いくつかのコメント:

dirLucene Directory クラスのインスタンスです。RAM インデックスとファイルシステムインデックスでは作成方法が異なります。詳細については、Lucene のドキュメントを参照してください。

"your_filed"用語を検索するためのフィールドです。複数のフィールドがある場合は、それらすべてに対して手順を実行できます。または、ファイルのインデックスを作成するときに、特別なフィールド (「_content」など) を作成し、他のすべてのフィールドの連結値をそこに保持することもできます。

score 1 · Accepted Answer

ルセン3.4を使用

カウントを取得する簡単な方法ですが、2 つの配列が必要です :-/

int[] docs = new int[1000];
int[] freqs = new int[1000];
int count = indexReader.termDocs(term).read(docs, freqs);

注意: read に使用する場合、read() の後ですでに列挙の最後にいるため、next() を使用できなくなります。

int[] docs = new int[1000];
int[] freqs = new int[1000];
TermDocs td = indexReader.termDocs(term);
int count = td.read(docs, freqs);
while (td.next()){ // always false, already at the end of the enumartion
}

search - luceneインデックスで単語の頻度を数える

3 に答える 3

ルセン3.4を使用

Related

Reference