json - 複雑な JSON を検索するための Solr 設計

Question

Solr を使用して複雑な JSON を検索するには、どのような設計が適切でしょうか? たとえば、次のようなドキュメントがあります。

{
    "books" : [
        {
            "title" : "Some title",
            "author" : "Some author",
            "genres" : [
                "thriller",
                "drama"
             ]
        },
        {
            "title" : "Some other title",
            "author" : "Some author",
            "genres" : [
                "comedy",
                "nonfiction",
                "thriller"
             ]
         }
    ]
 }

サンプルクエリは、著者が「Some author」で、本のジャンルの 1 つが「drama」である本を含むすべてのドキュメントを取得することです。

今私が思いついた設計は、schema.xml に dynamicField を用意することで、(今のところ) すべてをテキストとしてインデックス付けします。次のようになります。

 <dynamicField name="*" type="text" index="true" stored="true"/>

次に、SolrJ を使用して JSON を解析し、各データのフィールドを含む SolrInputDocument を作成します。たとえば、上記の JSON の例で作成されるフィールド/値は次のとおりです。

books0.title : "Some title"
books0.author : "Some author"
books0.genres0 : "thriller"
books0.genres1 : "drama"
books1.title : "Some other title"
books1.author : "Some author"
books1.genres0 : "comedy"
books1.genres1 : "nonfiction"
books1.genres2 : "thriller"

この時点で、LukeRequestHandler を使用してインデックス内のすべてのフィールドを取得し、関心のあるすべてのフィールドをチェックする大きな Solr クエリを作成できます。上記のサンプルクエリの場合、クエリはすべての「books#.author」をチェックします。および「books#.genres#」フィールド。このソリューションは洗練されていないように見え、多くのフィールドがある場合、クエリが非常に大きくなる可能性があります。

フィールド名にワイルドカードを使用できると便利ですが、Solr ではそれができないと思います。

おそらくスキーマで「copyField」と「multiValued」の巧妙な組み合わせを使用して、これを達成するためのより良い方法はありますか?

score 2 · Accepted Answer

ブックエンティティをドキュメントとしてインデックス付けできます。

<field name="id" type="string" indexed="true" stored="true" required="true" />  
<field name="title" type="text_general" indexed="true" stored="true"/>   
<!-- Don't perform stemming on authors - You can use field with lower case, ascii folding for analysis -->   
<field name="authors" type="string" indexed="true" stored="true" multiValued="true"/>  
<field name="genre" type="string" indexed="true" stored="true" multiValued="true"/>

Dismaxパーサーを使用して、著者とジャンルを検索します。
これらのフィールドに一致すると、ドキュメントが返されます。フィルタークエリ
でフィルタリングするためにジャンルを使用することもできます（例：fq = genre：drama）

フィールドごとに異なる検索動作が必要な場合は、copyFieldを使用してフィールドをコピーし、それらに対して異なる分析を実行することができます。例えば

<field name="genre_search" type="text_general" indexed="true" stored="true" multiValued="true"/>

<copyField source="genre" dest="genre_search"/>

score 0 · Accepted Answer

Solr Joinsを見る価値があるかもしれません。現在はアルファ版の 4.0 でのみ使用できますが、これらの複雑な関係の少なくとも一部またはすべてをモデル化できる可能性があります。パフォーマンスは、結合のないバニラ solr ほど良くはありませんが、完全に有効である可能性があるため、確認する必要があります。

json - 複雑な JSON を検索するための Solr 設計

2 に答える 2

Related

Reference