elasticsearch - ElasticSearch の文書数の基本を並べ替える

Question

ユーザー関係を ES Index に保存しています

すなわち

{'id' => 1, 'User_id_1' => '2001', '関係' => '友人', 'User_id_2' => '1002'} {'id' => 2, 'User_id_1' => '2002' ', '関係' => '友人', 'User_id_2' => '1002'}

{'id' => 3, 'User_id_1' => '2002', '関係' => '友人', 'User_id_2' => '1001'} {'id' => 4, 'User_id_1' => '2003' ', '関係' => '友人', 'User_id_2' => '1003'}

友達が一番多い user_id_2 を取得したいとは思わない。

上記の場合、その 1002 は 2001 として、2002 はその友人です。(カウント = 2)

クエリだけではわからない

ありがとう。

編集：

@imotovが示唆しているように、ファセットという用語は非常に良い選択ですが、

私が抱えている問題は2つのインデックスです

1 番目のインデックスはメインドキュメントを保存するためのもので、2 番目のインデックスは関係を保存するためのものです。

今問題は

メインインデックスに 100 個の USER ドキュメントがあり、そのうちの 50 個だけがリレーションを作成しているとします。したがって、リレーションインデックスには 50 個の USER ドキュメントしかありません。

したがって、「用語ファセット」を実装すると、結果が並べ替えられ、必要な正しい出力が得られますが、まだ関係を持っていない残りの 50 人のユーザーが不足しています。50 人が並べ替えられた後、最終出力でそれらが必要ですユーザー。

score 1 · Accepted Answer

まず、ES に保存された関係が一意であることを確認する必要があります。これは、任意の ID を user_id_1、relation、および user_id_2 から構築された ID に置き換えることで実行できます。また、user_ids のアナライザーが複数のトークンを生成しないようにする必要もあります。ID が文字列の場合、not_analyzed でインデックスを作成する必要があります。これら 2 つの条件が満たされると、relation:friend によって制限された結果リストのフィールド user_id_2 に対して、単純にterm ファセットクエリを使用できます。このクエリは、インデックス内の出現回数でソートされた上位の user_id_2 ID を取得します。まとめると、次のようになります。

curl -XPUT http://localhost:9200/relationships -d '{
    "mappings" : {
        "relation" : {
            "_source" : {"enabled" : false },
            "properties" : {
                "user_id_1": { "type": "string", "index" : "not_analyzed"},
                "relation": { "type": "string", "index" : "not_analyzed"},
                "user_id_2": { "type": "string", "index" : "not_analyzed"}
            }
        }
    }
}'

curl -XPUT http://localhost:9200/relationships/relation/2001-friend-1002 -d '{"user_id_1": "2001", "relation":"friend", "user_id_2": "1002"}'
curl -XPUT http://localhost:9200/relationships/relation/2002-friend-1002 -d '{"user_id_1": "2002", "relation":"friend", "user_id_2": "1002"}'
curl -XPUT http://localhost:9200/relationships/relation/2002-friend-1001 -d '{"user_id_1": "2002", "relation":"friend", "user_id_2": "1001"}'
curl -XPUT http://localhost:9200/relationships/relation/2003-friend-1003 -d '{"user_id_1": "2003", "relation":"friend", "user_id_2": "1003"}'
curl -XPOST http://localhost:9200/relationships/_refresh
echo


curl -XGET 'http://localhost:9200/relationships/relation/_search?pretty=true&search_type=count' -d '{
  "query": {
    "term" : {
      "relation" : "friend"
    }
  },
  "facets" : {
      "popular" : {
          "terms" : {
              "field" : "user_id_2"
          }
      }
  }
}'

ファセット計算の分散された性質により、複数のシャードが使用されている場合、ファセットクエリによって報告されるカウントが実際のレコード数よりも少なくなる可能性があることに注意してください。Elasticsearchの問題1832を参照してください

編集：

編集された質問には 2 つの解決策があります。1 つの解決策は、2 つのフィールドでファセットを使用することです。

curl -XPUT http://localhost:9200/relationships -d '{
    "mappings" : {
        "relation" : {
            "_source" : {"enabled" : false },
            "properties" : {
                "user_id_1": { "type": "string", "index" : "not_analyzed"},
                "relation": { "type": "string", "index" : "not_analyzed"},
                "user_id_2": { "type": "string", "index" : "not_analyzed"}
            }
        }
    }
}'
curl -XPUT http://localhost:9200/users -d '{
    "mappings" : {
        "user" : {
            "_source" : {"enabled" : false },
            "properties" : {
                "user_id": { "type": "string", "index" : "not_analyzed"}
            }
        }
    }
}'

curl -XPUT http://localhost:9200/users/user/1001 -d '{"user_id": 1001}'
curl -XPUT http://localhost:9200/users/user/1002 -d '{"user_id": 1002}'
curl -XPUT http://localhost:9200/users/user/1003 -d '{"user_id": 1003}'
curl -XPUT http://localhost:9200/users/user/1004 -d '{"user_id": 1004}'
curl -XPUT http://localhost:9200/users/user/1005 -d '{"user_id": 1005}'
curl -XPUT http://localhost:9200/relationships/relation/2001-friend-1002 -d '{"user_id_1": "2001", "relation":"friend", "user_id_2": "1002"}'
curl -XPUT http://localhost:9200/relationships/relation/2002-friend-1002 -d '{"user_id_1": "2002", "relation":"friend", "user_id_2": "1002"}'
curl -XPUT http://localhost:9200/relationships/relation/2002-friend-1001 -d '{"user_id_1": "2002", "relation":"friend", "user_id_2": "1001"}'
curl -XPUT http://localhost:9200/relationships/relation/2003-friend-1003 -d '{"user_id_1": "2003", "relation":"friend", "user_id_2": "1003"}'
curl -XPOST http://localhost:9200/relationships/_refresh
curl -XPOST http://localhost:9200/users/_refresh
echo


curl -XGET 'http://localhost:9200/relationships,users/_search?pretty=true&search_type=count' -d '{
    "query": {
        "indices" : {
          "indices" : ["relationships"],
          "query" : {
              "filtered" : {
                  "query" : {
                      "term" : {
                          "relation" : "friend"
                      }
                  },
                  "filter" : {
                      "type" : {
                          "value" : "relation"
                      }
                  }
              }
          },
          "no_match_query" : {
              "filtered" : {
                  "query" : {
                      "match_all" : { }
                  },
                  "filter" : {
                      "type" : {
                          "value" : "user"
                      }
                  }
              }

          }      
        }
    },
    "facets" : {
        "popular" : {
          "terms" : {
              "fields" : ["user_id", "user_id_2"]
          }
        }
    }
}'

別の解決策は、ユーザーの作成時にすべてのユーザーの関係インデックスに「自己」関係を追加することです。2 番目のソリューションの方が複雑ではないように思われるため、こちらをお勧めします。

elasticsearch - ElasticSearch の文書数の基本を並べ替える

1 に答える 1

Related

Reference