lucene - LuceneのtermPostionに基づいて用語を取得するには?

Question

Here are some code to access terms in a Lucene document:
int docId = hits[i].doc;  
TermFreqVector tfvector = reader.getTermFreqVector(docId, "contents");  
TermPositionVector tpvector = (TermPositionVector)tfvector;  
// this part works only if there is one term in the query string,  
// otherwise you will have to iterate this section over the query terms.  
int termidx = tfvector.indexOf(querystr);  
int[] termposx = tpvector.getTermPositions(termidx);  
TermVectorOffsetInfo[] tvoffsetinfo = tpvector.getOffsets(termidx);

私の質問は、termposx を使用して、termposx 配列に基づいて用語を取得するにはどうすればよいですか?

score 0 · Accepted Answer

Zincup: termposx には {7, 19, 34} があります。8 または 9 の項は何ですか? どうやってアクセスするの？

TermPositionVector.getTermPositions() は、用語が見つかった位置の配列を返します。

用語は、 indexOfメソッドから取得した文字列配列の用語にその番号が表示されるインデックスによって識別されます。

したがって、{7, 19, 34} の複数の位置に現れるのは同じ用語です。

TermPositionVector を使用すると、「各用語が見つかった位置」にアクセスできますが、その逆はできません。

残念ながら、8,9 番目の用語を見つけるには反復処理が必要です。API をさらに調査し、解決策が見つかったらお知らせします。

lucene - LuceneのtermPostionに基づいて用語を取得するには?

1 に答える 1

Related

Reference