java - Lucene：デフォルトのファセット区切り文字を変更しますか？

Question

この素晴らしいサイトへの最初の投稿！

私の目標は、Luceneを使用してインデックスを検索するために階層ファセットを使用することです。ただし、ファセットは「/」以外の文字（この場合は「〜」）で区切る必要があります。例：

カテゴリCategories〜Category1カテゴリ〜Category2

FacetIndexingParamsインターフェイス（DEFAULT_FACET_DELIM_CHARパラメーターが「〜」に設定されたDefaultFacetIndexingParamsのコピー）を実装するクラスを作成しました。

言い換えられたインデックスコード:(インデックスと分類法の両方にFSDirectoryを使用）

StandardAnalyzer analyzer = new StandardAnalyzer(Version.LUCENE_34)
IndexWriterConfig config = new IndexWriterConfig(Version.LUCENE_34, analyzer)
IndexWriter writer = new IndexWriter(indexDir, config)
TaxonomyWriter taxo = new LuceneTaxonomyWriter(taxDir, OpenMode.CREATE)

Document doc = new Document()
// Add bunch of Fields... hidden for the sake of brevity
List<CategoryPath> categories = new ArrayList<CategoryPath>()
row.tags.split('\\|').each{ tag ->
    def cp = new CategoryPath()
    tag.split('~').each{
        cp.add(it)
    }
    categories.add(cp)
}
NewFacetIndexingParams facetIndexingParams = new NewFacetIndexingParams()
DocumentBuilder categoryDocBuilder = new CategoryDocumentBuilder(taxo, facetIndexingParams)
categoryDocBuilder.setCategoryPaths(categories).build(doc)
writer.addDocument(doc)

// Commit and close both writer and taxo.

言い換えた検索コード：

// Create index and taxonomoy readers to get info from index and taxonomy
IndexReader indexReader = IndexReader.open(indexDir)
TaxonomyReader taxo = new LuceneTaxonomyReader(taxDir)
Searcher searcher = new IndexSearcher(indexReader)

QueryParser parser = new QueryParser(Version.LUCENE_34, "content", new StandardAnalyzer(Version.LUCENE_34))
parser.setAllowLeadingWildcard(true)
Query q = parser.parse(query)
TopScoreDocCollector tdc = TopScoreDocCollector.create(10, true)
List<FacetResult> res = null
NewFacetIndexingParams facetIndexingParams = new NewFacetIndexingParams()
FacetSearchParams facetSearchParams = new FacetSearchParams(facetIndexingParams)
CountFacetRequest cfr = new CountFacetRequest(new CategoryPath(""), 99)
cfr.setDepth(2)
cfr.setSortBy(SortBy.VALUE)
facetSearchParams.addFacetRequest(cfr)
FacetsCollector facetsCollector = new FacetsCollector(facetSearchParams, indexReader, taxo)

def cp = new CategoryPath("Category~Category1", (char)'~')
searcher.search(DrillDown.query(q, cp), MultiCollector.wrap(tdc, facetsCollector))

結果は常に「Category/Category1」の形式でファセットのリストを返します。

Lukeツールを使用してインデックスを確認しましたが、ファセットがインデックスの「〜」文字で区切られているようです。

これを行うための最良のルートは何ですか？どんな助けでも大歓迎です！

score 3 · Accepted Answer

私は問題を理解しました。検索とインデックス作成は想定どおりに機能しています。問題は、ファセット結果を取得する方法です。私が使用していた：

res = facetsCollector.getFacetResults()
res.each{ result ->
    result.getFacetResultNode().getLabel().toString()
}

私が使用する必要があったのは：

res = facetsCollector.getFacetResults()
res.each{ result ->
    result.getFacetResultNode().getLabel().toString((char)'~')
}

違いは、toString 関数に送信されるパラメータです!

見落としやすく、見つけにくい。

これが他の人に役立つことを願っています。

java - Lucene：デフォルトのファセット区切り文字を変更しますか？

1 に答える 1

Related

Reference