google-app-engine - AppEngineでのフェイルセーフデータストアの更新

Question

もちろん、AppEngineデータストアにはダウンタイムがあります。ただし、データストアエラーに直面してもより堅牢な「フェイルセーフ」プットが必要です（以下の動機を参照）。データストアが利用できない場合、タスクキューは書き込みを延期するための明白な場所のようです。ただし、他の解決策はわかりません（urlfetchを介してデータをサードパーティに送信する以外）。

動機：データストアに配置する必要のあるエンティティがあります。ユーザーにエラーメッセージを表示するだけでは不十分です。たとえば、簡単に元に戻すことができない何らかの副作用が発生した可能性があります（おそらくサードパーティのサイトとの相互作用）。

私は（私が思うに）合理的な「フェイルセーフ」プットを提供する単純なラッパーを思いついた（以下を参照）。これに問題がありますか、またはより堅牢な実装のアイデアがありますか？（注：NickJohnsonとSaxonDruceによる回答に投稿された提案のおかげで、この投稿はコードにいくつかの改良を加えて編集されました。）

import logging
from google.appengine.api.labs.taskqueue import taskqueue
from google.appengine.datastore import entity_pb
from google.appengine.ext import db
from google.appengine.runtime.apiproxy_errors import CapabilityDisabledError

def put_failsafe(e, db_put_deadline=20, retry_countdown=60, queue_name='default'):
    """Tries to e.put().  On success, 1 is returned.  If this raises a db.Error
    or CapabilityDisabledError, then a task will be enqueued to try to put the
    entity (the task will execute after retry_countdown seconds) and 2 will be
    returned.  If the task cannot be enqueued, then 0 will be returned.  Thus a
    falsey value is only returned on complete failure.

    Note that since the taskqueue payloads are limited to 10kB, if the protobuf
    representing e is larger than 10kB then the put will be unable to be
    deferred to the taskqueue.

    If a put is deferred to the taskqueue, then it won't necessarily be
    completed as soon as the datastore is back up.  Thus it is possible that
    e.put() will occur *after* other, later puts when 1 is returned.

    Ensure e's model is imported in the code which defines the task which tries
    to re-put e (so that e can be deserialized).
    """
    try:
        e.put(rpc=db.create_rpc(deadline=db_put_deadline))
        return 1
    except (db.Error, CapabilityDisabledError), ex1:
        try:
            taskqueue.add(queue_name=queue_name,
                          countdown=retry_countdown,
                          url='/task/retry_put',
                          payload=db.model_to_protobuf(e).Encode())
            logging.info('failed to put to db now, but deferred put to the taskqueue e=%s ex=%s' % (e, ex1))
            return 2
        except (taskqueue.Error, CapabilityDisabledError), ex2:
            return 0

タスクのリクエストハンドラ：

from google.appengine.ext import db, webapp

# IMPORTANT: This task deserializes entity protobufs.  To ensure that this is
#            successful, you must import any db.Model that may need to be
#            deserialized here (otherwise this task may raise a KindError).

class RetryPut(webapp.RequestHandler):
    def post(self):
        e = db.model_from_protobuf(entity_pb.EntityProto(self.request.body))
        e.put() # failure will raise an exception => the task to be retried

私はこれをすべてのプットに使用することを期待していません-ほとんどの場合、エラーメッセージを表示することは問題ありません。すべてのプットに使用したくなりますが、変更が後で表示されることをユーザーに伝えると（データストアがバックアップされて延期されるまで古いデータを表示し続けると、ユーザーが混乱する可能性があると思います）puts execute）。

score 2 · Accepted Answer

あなたのアプローチは合理的ですが、いくつかの注意点があります。

デフォルトでは、put操作は時間がなくなるまで再試行します。バックアップ戦略があるので、もっと早くあきらめたいと思うかもしれません。その場合、カスタムの期限を指定して、putメソッド呼び出しにrpcパラメーターを指定する必要があります。
明示的なカウントダウンを設定する必要はありません。タスクキューは、失敗した操作を徐々に再試行します。
pickleを使用する必要はありません-プロトコルバッファには、はるかに効率的な自然な文字列エンコーディングがあります。使用方法のデモンストレーションについては、この投稿を参照してください。
Saxonが指摘しているように、タスクキューのペイロードは10キロバイトに制限されているため、大規模なエンティティで問題が発生する可能性があります。
最も重要なことは、これによりデータストアの整合性モデルが「強整合性」から「結果整合性」に変更されることです。つまり、タスクキューにエンキューしたプットは、将来いつでも適用でき、その間に行われた変更はすべて上書きされます。任意の数の競合状態が発生する可能性があり、タスクキューに保留中のプットがある場合、基本的にトランザクションは役に立たなくなります。

score 1 · Accepted Answer

潜在的な問題の1つは、タスクが10kbのデータに制限されていることです。したがって、一度ピクルスにしたものよりも大きいエンティティがある場合、これは機能しません。

google-app-engine - AppEngineでのフェイルセーフデータストアの更新

2 に答える 2

Related

Reference