n-gram - ElasticSearchn-gramtokenfilterが部分的な単語を見つけられない

Question

私は自分の新しいプロジェクトのためにElasticSearchで遊んでいます。ngramtokenfilterを使用するようにデフォルトのアナライザーを設定しました。これは私のelasticsearch.ymlファイルです：

index:
analysis:
    analyzer:
        default_index:
            tokenizer: standard
            filter: [standard, stop, mynGram]
        default_search:
            tokenizer: standard
            filter: [standard, stop]

    filter:
        mynGram:
            type: nGram
            min_gram: 1
            max_gram: 10

新しいインデックスを作成し、それに次のドキュメントを追加しました。

$ curl -XPUT http://localhost:9200/test/newtype/3 -d '{"text": "one two three four five six"}'
{"ok":true,"_index":"test","_type":"newtype","_id":"3"}

ただし、クエリtext:hreeやtext:iveその他の部分的な用語を使用して検索すると、ElasticSearchはこのドキュメントを返しません。正確な用語（のように）を検索した場合にのみドキュメントが返されますtext:two。

また、default_searchもngramトークンフィルターを使用するように構成ファイルを変更しようとしましたが、結果は同じでした。ここで何が間違っているので、どうすれば修正できますか？

score 10 · Accepted Answer

default_*設定についてはよくわかりません。ただし、index_analyzerとsearch_analyzerを指定するマッピングを適用すると機能します。

curl -XDELETE localhost:9200/twitter
curl -XPOST localhost:9200/twitter -d '
{"index": 
  { "number_of_shards": 1,
    "analysis": {
       "filter": {
                  "mynGram" : {"type": "nGram", "min_gram": 2, "max_gram": 10}
                 },
       "analyzer": { "a1" : {
                    "type":"custom",
                    "tokenizer": "standard",
                    "filter": ["lowercase", "mynGram"]
                    }
                  } 
     }
  }
}
}'

curl -XPUT localhost:9200/twitter/tweet/_mapping -d '{
    "tweet" : {
        "index_analyzer" : "a1",
        "search_analyzer" : "standard", 
        "date_formats" : ["yyyy-MM-dd", "dd-MM-yyyy"],
        "properties" : {
            "user": {"type":"string", "analyzer":"standard"},
            "message" : {"type" : "string" }
        }
    }}'

curl -XPUT 'http://localhost:9200/twitter/tweet/1' -d '{
    "user" : "kimchy",
    "post_date" : "2009-11-15T14:12:12",
    "message" : "trying out Elastic Search"
}'

curl -XGET localhost:9200/twitter/_search?q=ear
curl -XGET localhost:9200/twitter/_search?q=sea

curl -XGET localhost:9200/twitter/_mapping

score 1 · Accepted Answer

get Mapping APIをチェックして、マッピングが適用されているかどうかを確認する必要があります： http ://www.elasticsearch.org/guide/reference/api/admin-indices-get-mapping.html

ところで、メーリングリストでは、インデックスにすでにドキュメントが含まれている場合、elasticsearch.ymlに設定したマッピングは適用されないと言われています。最初にインデックスをクリーンアップする必要があります。

ESでngramを試しましたが、問題なく動作します。

n-gram - ElasticSearchn-gramtokenfilterが部分的な単語を見つけられない

2 に答える 2

Related

Reference