6

私はelasticsearch操作にelasticsearch-pyを使用しています。

elasticsearch.helpers.bulk複数のレコードを作成または更新しようとしています。

from elasticsearch import Elasticsearch
from elasticsearch import helpers
es = Elasticsearch()

data = [
    {
        "_index": "customer",
        "_type": "external",
        "_op_type": "create",
        "_id": 3,
        "doc" : {"name": "test"}
    },
    {
        "_index": "customer",
        "_type": "external",
        "_op_type": "create",
        "_id": 4,
        "doc" : {"name": "test"}
    },
    {
        "_index": "customer",
        "_type": "external",
        "_op_type": "create",
        "_id": 5,
        "doc" : {"name": "test"}
    },
    {
        "_index": "customer",
        "_type": "external",
        "_op_type": "create",
        "_id": 6,
        "doc" : {"name": "test"}
    },
]


print helpers.bulk(es, data)

この操作を実行する方法はありますか?

今はまたは_op_typeとしてのみ与えることができます。レコードが存在しない場合、エラーが発生します。createupdateupdate

Traceback (most recent call last):
  File "/tmp/test.py", line 37, in <module>
    print helpers.bulk(es, data)
  File "/local/lib/python2.7/site-packages/elasticsearch/helpers/__init__.py", line 182, in bulk
    for ok, item in streaming_bulk(client, actions, **kwargs):
  File "/local/lib/python2.7/site-packages/elasticsearch/helpers/__init__.py", line 155, in streaming_bulk
    raise BulkIndexError('%i document(s) failed to index.' % len(errors), errors)
elasticsearch.helpers.BulkIndexError: ('4 document(s) failed to index.', [{u'update': {u'status': 404, u'_type': u'external', u'_id': u'3', u'error': u'DocumentMissingException[[customer][-1] [external][3]: document missing]', u'_index': u'customer'}}, {u'update': {u'status': 404, u'_type': u'external', u'_id': u'4', u'error': u'DocumentMissingException[[customer][-1] [external][4]: document missing]', u'_index': u'customer'}}, {u'update': {u'status': 404, u'_type': u'external', u'_id': u'5', u'error': u'DocumentMissingException[[customer][-1] [external][5]: document missing]', u'_index': u'customer'}}, {u'update': {u'status': 404, u'_type': u'external', u'_id': u'6', u'error': u'DocumentMissingException[[customer][-1] [external][6]: document missing]', u'_index': u'customer'}}])
4

2 に答える 2

7

According to the _bulk endpoint documentation, you can and should use the index action for this, provided your documents always have the same identifiers.

create is useful when creating documents the first time, and update is more meant for doing partial and/or scripted updates.

You can also not specify any _op_type at all and index will be taken by default.

于 2015-08-21T06:39:35.310 に答える
4

@Valが提案した解決策を試してみましたが、魅力的です。

from elasticsearch import Elasticsearch
from elasticsearch import helpers
es = Elasticsearch()

data = [
    {
        "_index": "customer",
        "_type": "external",
        "_id": 3,
        "doc" : {"name": "test"}
    },
    {
        "_index": "customer",
        "_type": "external",
        "_id": 4,
        "doc" : {"name": "test"}
    },
    {
        "_index": "customer",
        "_type": "external",
        "_id": 5,
        "doc" : {"name": "test"}
    },
    {
        "_index": "customer",
        "_type": "external",
        "_id": 6,
        "doc" : {"name": "test"}
    },
]


print helpers.bulk(es, data)
于 2015-08-21T06:42:20.823 に答える