python - Pythonを使用してelasticsearch-dslのフィールドを集約します

Question

ドキュメントに関する情報を集計 (合計およびカウント) する Python ステートメントの書き方を教えてもらえますか?

脚本

from datetime import datetime
from elasticsearch_dsl import DocType, String, Date, Integer
from elasticsearch_dsl.connections import connections

from elasticsearch import Elasticsearch
from elasticsearch_dsl import Search, Q

# Define a default Elasticsearch client
client = connections.create_connection(hosts=['http://blahblahblah:9200'])

s = Search(using=client, index="attendance")
s = s.execute()

for tag in s.aggregations.per_tag.buckets:
    print (tag.key)

出力

File "/Library/Python/2.7/site-packages/elasticsearch_dsl/utils.py", line 106, in __getattr__
'%r object has no attribute %r' % (self.__class__.__name__, attr_name))
AttributeError: 'Response' object has no attribute 'aggregations'

これは何が原因ですか？「集計」キーワードは間違っていますか? インポートする必要がある他のパッケージはありますか? 「出席」インデックス内のドキュメントに emailAddress というフィールドがある場合、どのドキュメントがそのフィールドに値を持っているかをどのように数えますか?

score 33 · Accepted Answer

初めに。ここに書いたことには、実際には集計が定義されていないことに気付きました。これを使用する方法に関するドキュメントは、私にはあまり読みにくいです。上に書いたことを使って、展開していきます。より良い例にするために、インデックス名を変更しています。

from datetime import datetime
from elasticsearch_dsl import DocType, String, Date, Integer
from elasticsearch_dsl.connections import connections

from elasticsearch import Elasticsearch
from elasticsearch_dsl import Search, Q

# Define a default Elasticsearch client
client = connections.create_connection(hosts=['http://blahblahblah:9200'])

s = Search(using=client, index="airbnb", doc_type="sleep_overs")
s = s.execute()

# invalid! You haven't defined an aggregation.
#for tag in s.aggregations.per_tag.buckets:
#    print (tag.key)

# Lets make an aggregation
# 'by_house' is a name you choose, 'terms' is a keyword for the type of aggregator
# 'field' is also a keyword, and 'house_number' is a field in our ES index
s.aggs.bucket('by_house', 'terms', field='house_number', size=0)

上記では、番地ごとに 1 つのバケットを作成しています。したがって、バケットの名前は番地になります。ElasticSearch (ES) は常に、そのバケットに収まるドキュメントのドキュメントカウントを提供します。Size=0 は、すべての結果を使用することを意味します。これは、ES のデフォルト設定が 10 個の結果のみを返す (または開発者が設定したものは何でも) ためです。

# This runs the query.
s = s.execute()

# let's see what's in our results

print s.aggregations.by_house.doc_count
print s.hits.total
print s.aggregations.by_house.buckets

for item in s.aggregations.by_house.buckets:
    print item.doc_count

以前の私の間違いは、Elastic Search クエリにはデフォルトで集計があると考えていたことです。自分で定義してから実行します。次に、あなたの応答は、あなたが言及したアグリゲーターに分割できます。

上記の CURL は次のようになります
。SENSE では、 // を使用してコメントアウトできます。

POST /airbnb/sleep_overs/_search
{
// the size 0 here actually means to not return any hits, just the aggregation part of the result
    "size": 0,
    "aggs": {
        "by_house": {
            "terms": {
// the size 0 here means to return all results, not just the the default 10 results
                "field": "house_number",
                "size": 0
            }
        }
    }
}

回避策。DSL の GIT の誰かが、翻訳を忘れて、この方法を使用するように私に言いました。その方が簡単で、難しいことは CURL で書けばよいだけです。そのため、私はそれを回避策と呼んでいます。

# Define a default Elasticsearch client
client = connections.create_connection(hosts=['http://blahblahblah:9200'])
s = Search(using=client, index="airbnb", doc_type="sleep_overs")

# how simple we just past CURL code here
body = {
    "size": 0,
    "aggs": {
        "by_house": {
            "terms": {
                "field": "house_number",
                "size": 0
            }
        }
    }
}

s = Search.from_dict(body)
s = s.index("airbnb")
s = s.doc_type("sleepovers")
body = s.to_dict()

t = s.execute()

for item in t.aggregations.by_house.buckets:
# item.key will the house number
    print item.key, item.doc_count

お役に立てれば。私は今、すべてを CURL で設計してから、Python ステートメントを使用して結果を剥がし、必要なものを取得しています。これは、複数レベルの集計 (サブ集計) に役立ちます。

python - Pythonを使用してelasticsearch-dslのフィールドを集約します

2 に答える 2

Related

Reference