full-text-search - ElasticSearchでの範囲と用語のブースト

Question

ElasticSearchでブーストを希望どおりに機能させるのに苦労しています。

性別、興味、年齢を含むインデックスが作成されたプロファイルがあるとします。性別が一致することが最も関連性が高く、興味と最も重要でない基準はユーザーの年齢であるとしましょう。以下のクエリでは、上記の原則に従って一致するプロファイルが並べ替えられると期待していましたが、実行すると、最初に男性が何人か取得され、次に車が好きな女性のマリアの前に50歳の女性のアンナが取得されます。 ...なぜマリアはアンナよりも高いスコアを取得しないのですか？

{
    "query": {
        "bool" : {
            "should" : [
                { "term"  : { "gender" : { "term": "male", "boost": 10.0 } } },
                { "term"  : { "likes"  : { "term": "cars", "boost" : 5.0 } } },
                { "range" : { "age"    : { "from" : 50,    "boost" : 1.0 } } }
            ],
            "minimum_number_should_match" : 1
        }
    }    
}

ヒントは大歓迎です、

スタイン

実行されるcurlコマンドは次のとおりです。

$ curl -XPUT http://localhost:9200/users/profile/1 -d '{
    "nickname" : "bob",
    "gender" : "male",
    "age" : 48,
    "likes" : "airplanes"
}'

$ curl -XPUT http://localhost:9200/users/profile/2 -d '{
    "nickname" : "carlos",
    "gender" : "male",
    "age" : 24,
    "likes" : "food"
}'

$ curl -XPUT http://localhost:9200/users/profile/3 -d '{
    "nickname" : "julio",
    "gender" : "male",
    "age" : 18,
    "likes" : "ladies"
}'

$ curl -XPUT http://localhost:9200/users/profile/4 -d '{
    "nickname" : "maria",
    "gender" : "female",
    "age" : 25,
    "likes" : "cars"
}'

$ curl -XPUT http://localhost:9200/users/profile/5 -d '{
    "nickname" : "anna",
    "gender" : "female",
    "age" : 50,
    "likes" : "clothes"
}'

$ curl -XGET http://localhost:9200/users/profile/_search -d '{
    "query": {
        "bool" : {
            "should" : [
                { "term" : { "gender" : { "term": "male", "boost": 10.0 } } },
                { "term" : { "likes" : { "term": "cars", "boost" : 5.0 } } },
                { "range" : { "age" : { "from" : 50, "boost" : 1.0 } } }
            ],
            "minimum_number_should_match" : 1
        }
    }    
}'

score 14 · Accepted Answer

boost値は絶対値ではありません。他の要素と組み合わせて、各用語の関連性を判断します。

あなたには2つの「性別」（私は推測します）がありますが、多くの異なる「いいね」があります。したがってmale、データ内で頻繁に発生するため、ほとんど無関係と見なされます。ただし、cars発生するのは数回だけである可能性があるため、より関連性が高いと見なされます。

このロジックは全文検索には役立ちますが、本質的にフィルターとして使用することを目的とした列挙型には役立ちません。

omit_term_freq_and_positions幸い、とを使用して、フィールドごとにこの機能を無効にすることができますomit_norms。

次のようにマッピングを設定してみてください。

curl -XPUT 'http://127.0.0.1:9200/test/?pretty=1'  -d '
{
   "mappings" : {
      "test" : {
         "properties" : {
            "likes" : {
               "index" : "not_analyzed",
               "omit_term_freq_and_positions" : 1,
               "omit_norms" : 1,
               "type" : "string"
            },
            "gender" : {
               "index" : "not_analyzed",
               "omit_term_freq_and_positions" : 1,
               "omit_norms" : 1,
               "type" : "string"
            },
            "age" : {
               "type" : "integer"
            }
         }
      }
   }
}
'

更新：完全に機能する例：

既存のインデックスを削除します。

curl -XDELETE 'http://127.0.0.1:9200/users/?pretty=1'

新しいマッピングを使用してインデックスを作成します。

curl -XPUT 'http://127.0.0.1:9200/users/?pretty=1'  -d '
{
   "mappings" : {
      "profile" : {
         "properties" : {
            "likes" : {
               "index" : "not_analyzed",
               "omit_term_freq_and_positions" : 1,
               "type" : "string",
               "omit_norms" : 1
            },
            "age" : {
               "type" : "integer"
            },
            "gender" : {
               "index" : "not_analyzed",
               "omit_term_freq_and_positions" : 1,
               "type" : "string",
               "omit_norms" : 1
            }
         }
      }
   }
}
'

テストドキュメントのインデックスを作成します。

curl -XPOST 'http://127.0.0.1:9200/users/profile/_bulk?pretty=1'  -d '
{"index" : {"_id" : 1}}
{"nickname" : "bob", "likes" : "airplanes", "age" : 48, "gender" : "male"}
{"index" : {"_id" : 2}}
{"nickname" : "carlos", "likes" : "food", "age" : 24, "gender" : "male"}
{"index" : {"_id" : 3}}
{"nickname" : "julio", "likes" : "ladies", "age" : 18, "gender" : "male"}
{"index" : {"_id" : 4}}
{"nickname" : "maria", "likes" : "cars", "age" : 25, "gender" : "female"}
{"index" : {"_id" : 5}}
{"nickname" : "anna", "likes" : "clothes", "age" : 50, "gender" : "female"}
'

インデックスを更新します（最新のドキュメントが検索に表示されるようにします）。

curl -XPOST 'http://127.0.0.1:9200/users/_refresh?pretty=1'

探す：

curl -XGET 'http://127.0.0.1:9200/users/profile/_search?pretty=1'  -d '
{
   "query" : {
      "bool" : {
         "minimum_number_should_match" : 1,
         "should" : [
            {
               "term" : {
                  "gender" : {
                     "boost" : 10,
                     "term" : "male"
                  }
               }
            },
            {
               "term" : {
                  "likes" : {
                     "boost" : 5,
                     "term" : "cars"
                  }
               }
            },
            {
               "range" : {
                  "age" : {
                     "boost" : 1,
                     "from" : 50
                  }
               }
            }
         ]
      }
   }
}
'

結果：

# {
#    "hits" : {
#       "hits" : [
#          {
#             "_source" : {
#                "nickname" : "bob",
#                "likes" : "airplanes",
#                "age" : 48,
#                "gender" : "male"
#             },
#             "_score" : 0.053500723,
#             "_index" : "users",
#             "_id" : "1",
#             "_type" : "profile"
#          },
#          {
#             "_source" : {
#                "nickname" : "carlos",
#                "likes" : "food",
#                "age" : 24,
#                "gender" : "male"
#             },
#             "_score" : 0.053500723,
#             "_index" : "users",
#             "_id" : "2",
#             "_type" : "profile"
#          },
#          {
#             "_source" : {
#                "nickname" : "julio",
#                "likes" : "ladies",
#                "age" : 18,
#                "gender" : "male"
#             },
#             "_score" : 0.053500723,
#             "_index" : "users",
#             "_id" : "3",
#             "_type" : "profile"
#          },
#          {
#             "_source" : {
#                "nickname" : "anna",
#                "likes" : "clothes",
#                "age" : 50,
#                "gender" : "female"
#             },
#             "_score" : 0.029695695,
#             "_index" : "users",
#             "_id" : "5",
#             "_type" : "profile"
#          },
#          {
#             "_source" : {
#                "nickname" : "maria",
#                "likes" : "cars",
#                "age" : 25,
#                "gender" : "female"
#             },
#             "_score" : 0.015511602,
#             "_index" : "users",
#             "_id" : "4",
#             "_type" : "profile"
#          }
#       ],
#       "max_score" : 0.053500723,
#       "total" : 5
#    },
#    "timed_out" : false,
#    "_shards" : {
#       "failed" : 0,
#       "successful" : 5,
#       "total" : 5
#    },
#    "took" : 4
# }

更新：代替アプローチ

ここでは、より冗長でありながら、はるかに予測可能な結果を提供する代替クエリを示します。これには、カスタムフィルタースコアクエリの使用が含まれます。まず、少なくとも1つの条件に一致するドキュメントにドキュメントをフィルタリングします。一定スコアクエリを使用するため、すべてのドキュメントの初期スコアは1です。

カスタムフィルタースコアを使用すると、フィルターに一致する場合に各ドキュメントをブーストできます。

curl -XGET 'http://127.0.0.1:9200/_all/_search?pretty=1'  -d '
{
   "query" : {
      "custom_filters_score" : {
         "query" : {
            "constant_score" : {
               "filter" : {
                  "or" : [
                     {
                        "term" : {
                           "gender" : "male"
                        }
                     },
                     {
                        "term" : {
                           "likes" : "cars"
                        }
                     },
                     {
                        "range" : {
                           "age" : {
                              "gte" : 50
                           }
                        }
                     }
                  ]
               }
            }
         },
         "score_mode" : "total",
         "filters" : [
            {
               "boost" : "10",
               "filter" : {
                  "term" : {
                     "gender" : "male"
                  }
               }
            },
            {
               "boost" : "5",
               "filter" : {
                  "term" : {
                     "likes" : "cars"
                  }
               }
            },
            {
               "boost" : "1",
               "filter" : {
                  "range" : {
                     "age" : {
                        "gte" : 50
                     }
                  }
               }
            }
         ]
      }
   }
}
'

各ドキュメントに関連付けられているスコアは、一致した句に簡単にさかのぼることができる優れたラウンド数であることがわかります。

# [Fri Jun  8 21:30:24 2012] Response:
# {
#    "hits" : {
#       "hits" : [
#          {
#             "_source" : {
#                "nickname" : "bob",
#                "likes" : "airplanes",
#                "age" : 48,
#                "gender" : "male"
#             },
#             "_score" : 10,
#             "_index" : "users",
#             "_id" : "1",
#             "_type" : "profile"
#          },
#          {
#             "_source" : {
#                "nickname" : "carlos",
#                "likes" : "food",
#                "age" : 24,
#                "gender" : "male"
#             },
#             "_score" : 10,
#             "_index" : "users",
#             "_id" : "2",
#             "_type" : "profile"
#          },
#          {
#             "_source" : {
#                "nickname" : "julio",
#                "likes" : "ladies",
#                "age" : 18,
#                "gender" : "male"
#             },
#             "_score" : 10,
#             "_index" : "users",
#             "_id" : "3",
#             "_type" : "profile"
#          },
#          {
#             "_source" : {
#                "nickname" : "maria",
#                "likes" : "cars",
#                "age" : 25,
#                "gender" : "female"
#             },
#             "_score" : 5,
#             "_index" : "users",
#             "_id" : "4",
#             "_type" : "profile"
#          },
#          {
#             "_source" : {
#                "nickname" : "anna",
#                "likes" : "clothes",
#                "age" : 50,
#                "gender" : "female"
#             },
#             "_score" : 1,
#             "_index" : "users",
#             "_id" : "5",
#             "_type" : "profile"
#          }
#       ],
#       "max_score" : 10,
#       "total" : 5
#    },
#    "timed_out" : false,
#    "_shards" : {
#       "failed" : 0,
#       "successful" : 20,
#       "total" : 20
#    },
#    "took" : 6
# }

full-text-search - ElasticSearchでの範囲と用語のブースト

1 に答える 1

Related

Reference