python - mpttを使用してPython/Djangoのツリー構造を反映するJSONを作成する最速の方法

Question

Djangoクエリセットに基づいてJSONを作成するPython（Django）の最速の方法は何ですか。ここで提案されているようにテンプレートで解析することはオプションではないことに注意してください。

背景には、ツリー内のすべてのノードをループするメソッドを作成しましたが、約300ノードを変換するとすでに非常に低速です。私の頭に浮かんだ最初の（そしておそらく最悪の）アイデアは、なんとかして「手動で」jsonを作成することです。以下のコードを参照してください。

#! Solution 1 !!#
def quoteStr(input):
    return "\"" + smart_str(smart_unicode(input)) + "\""

def createJSONTreeDump(user, node, root=False, lastChild=False):
    q = "\""

    #open tag for object
    json = str("\n" + indent + "{" +
                  quoteStr("name") + ": " + quoteStr(node.name) + ",\n" +
                  quoteStr("id") + ": " + quoteStr(node.pk) + ",\n" +
                )

    childrenTag = "children"
    children = node.get_children()
    if children.count() > 0 :
        #create children array opening tag
        json += str(indent + quoteStr(childrenTag) + ": [")
        #for child in children:
        for idx, child in enumerate(children):
            if (idx + 1) == children.count():
                //recursive call
                json += createJSONTreeDump(user, child, False, True, layout)
            else:
                //recursive call
                json += createJSONTreeDump(user, child, False, False, layout)
        #add children closing tag
        json += "]\n"

    #closing tag for object
    if lastChild == False:
        #more children following, add ","
        json += indent + "},\n"
    else:
        #last child, do not add ","
        json += indent + "}\n"
    return json

レンダリングされるツリー構造は、mpttで構築されたツリーであり、.get_children（）の呼び出しはノードのすべての子を返します。

モデルはこれと同じくらい単純に見え、mpttが他のすべてを処理します。

class Node(MPTTModel, ExtraManager):
    """
    Representation of a single node
    """ 
    name = models.CharField(max_length=200)
    parent = TreeForeignKey('self', null=True, blank=True, related_name='%(app_label)s_%(class)s_children')

テンプレートでこのように作成された期待されるJSON結果var root = {{ jsonTree|safe }}

編集：この回答に基づいて、私は次のコード（間違いなくより良いコード）を作成しましたが、少しだけ速く感じます。

解決策2：解決策2：

def serializable_object(node):
    "Recurse into tree to build a serializable object"
    obj = {'name': node.name, 'id': node.pk, 'children': []}
    for child in node.get_children():
        obj['children'].append(serializable_object(child))
    return obj

import json
jsonTree = json.dumps(serializable_object(nodeInstance))

解決策3：

def serializable_object_List_Comprehension(node):
    "Recurse into tree to build a serializable object"
    obj = {
        'name': node.name,
        'id': node.pk,
        'children': [serializable_object(ch) for ch in node.get_children()]
    }
    return obj

解決策4：

def recursive_node_to_dict(node):
    result = {
        'name': node.name, 'id': node.pk
    }
    children = [recursive_node_to_dict(c) for c in node.get_children()],
    if children is not None:
        result['children'] = children
    return result

from mptt.templatetags.mptt_tags import cache_tree_children
root_nodes = cache_tree_children(root.get_descendants())
dicts = []
for n in root_nodes:
    dicts.append(recursive_node_to_dict(root_nodes[0]))
    jsonTree = json.dumps(dicts, indent=4)

解決策5（pre_fetchにselect_relatedを使用しますが、正しく使用されているかどうかはわかりません）

def serializable_object_select_related(node):
    "Recurse into tree to build a serializable object, make use of select_related"
    obj = {'name': node.get_wbs_code(), 'wbsCode': node.get_wbs_code(), 'id': node.pk, 'level': node.level, 'position': node.position, 'children': []}
    for child in node.get_children().select_related():
        obj['children'].append(serializable_object(child))
    return obj

解決策6（子ノードのキャッシュを使用した改善された解決策4）：

def recursive_node_to_dict(node):
    return {
        'name': node.name, 'id': node.pk,
         # Notice the use of node._cached_children instead of node.get_children()
        'children' : [recursive_node_to_dict(c) for c in node._cached_children]
    }

経由で呼び出されます：

from mptt.templatetags.mptt_tags import cache_tree_children
subTrees = cache_tree_children(root.get_descendants(include_self=True))
subTreeDicts = []
for subTree in subTrees:
    subTree = recursive_node_to_dict(subTree)
    subTreeDicts.append(subTree)
jsonTree = json.dumps(subTreeDicts, indent=4)
#optional clean up, remove the [ ] at the beginning and the end, its needed for D3.js
jsonTree = jsonTree[1:len(jsonTree)]
jsonTree = jsonTree[:len(jsonTree)-1]

以下に、MuMindによって提案されたcProfileを使用して作成されたプロファイリング結果を示します。これは、スタンドアロンメソッドprofileJSON（）を開始するDjangoビューを設定し、JSON出力を作成するためにさまざまなソリューションを呼び出します。

def startProfileJSON(request):
    print "startProfileJSON"
    import cProfile
    cProfile.runctx('profileJSON()', globals=globals(), locals=locals())
    print "endProfileJSON"

結果：

解決策1： 4.969秒で3350347関数呼び出し（3130372プリミティブ呼び出し）（詳細）

解決策2： 3.630秒で2533705関数呼び出し（2354516プリミティブ呼び出し）（詳細）

解決策3： 3.684秒で2533621関数呼び出し（2354441プリミティブ呼び出し）（詳細）

解決策4： 3.840秒で2812725関数呼び出し（2466028プリミティブ呼び出し）（詳細）

解決策5： 3.779秒での2536504関数呼び出し（2357256プリミティブ呼び出し）（詳細）

解決策6（改善された解決策4）： 3.663秒で2593122関数呼び出し（2299165プリミティブ呼び出し）（詳細）

討論：

解決策1：独自のエンコーディング実装。悪いアイデア

解決策2+3：現在は最速ですが、それでも痛々しいほど遅い

解決策4：子をキャッシュすることで有望に見えますが、子がdouble []に入れられるため、同様のパフォーマンスを示し、現在は無効なjsonを生成します。

"children": [[]] instead of "children": []

解決策5：select_relatedを使用しても違いはありませんが、ノードには常に親への外部キーがあり、ルートから子へと解析しているため、おそらく間違った方法で使用されます。

更新：解決策6：子ノードのキャッシュを使用した、私にとって最もクリーンな解決策のように見えます。しかし、ソリューション2+3と同様にしか機能しません。これは私にとっては奇妙なことです。

パフォーマンスを改善するためのアイデアは他にありますか？

score 29 · Accepted Answer

最大の速度低下は、これがノードごとに 1 つのデータベースクエリを実行することだと思います。json のレンダリングは、データベースへの何百回ものラウンドトリップに比べれば些細なことです。

これらのクエリを一度に実行できるように、各ノードで子をキャッシュする必要があります。django-mptt には、これを行うことができるcache_tree_children()関数があります。

import json
from mptt.templatetags.mptt_tags import cache_tree_children

def recursive_node_to_dict(node):
    result = {
        'id': node.pk,
        'name': node.name,
    }
    children = [recursive_node_to_dict(c) for c in node.get_children()]
    if children:
        result['children'] = children
    return result

root_nodes = cache_tree_children(Node.objects.all())
dicts = []
for n in root_nodes:
    dicts.append(recursive_node_to_dict(n))

print json.dumps(dicts, indent=4)

カスタム json エンコーディングは、シナリオによってはわずかなスピードアップを提供する可能性がありますが、多くのコードが必要になるため、私は強くお勧めしません。

score 8 · Accepted Answer

更新されたバージョンでは、オーバーヘッドがほとんどないように見えます。リスト内包表記を使用する方が少し効率的 (そして読みやすい!) になると思います。

def serializable_object(node):
    "Recurse into tree to build a serializable object"
    obj = {
        'name': node.name,
        'children': [serializable_object(ch) for ch in node.get_children()]
    }
    return obj

それに加えて、ボトルネックを見つけるためにプロファイリングするだけです。300 ノードをロードしてシリアライズし、それを実行するスタンドアロンコードを記述します。

python -m profile serialize_benchmark.py

（または-m cProfileそれがうまくいく場合）。

は、3 つの異なる潜在的なボトルネックを確認できます。

DB アクセス (.get_children()および.name) -- 内部で何が起こっているのか正確にはわかりませんが、ノードごとに DB クエリを実行するこのようなコードがあり、膨大なオーバーヘッドが追加されています。それが問題である場合は、select_relatedなどを使用して「イーガーロード」を実行するように構成できます。
関数呼び出しのオーバーヘッド (例:serializable_objectそれ自体) -- ncalls forserializable_objectが適切な数に見えることを確認してください。あなたの説明が理解できれば、それは 300 くらいになるはずです。
最後にシリアライズする ( json.dumps(nodeInstance)) -- ノード数が 300 しかないとおっしゃっていたので、おそらく原因ではありませんが、これが多くの実行時間を消費している場合は、コンパイルされた JSON のスピードアップが適切に機能していることを確認してください。

プロファイリングしてもあまりわからない場合は、たとえば、再帰的に呼び出すが、結果をデータ構造に格納しない、簡素化されたバージョンを作成し、それnode.nameがnode.get_children()どのように比較されるかを確認してください。

更新: ソリューション 3 には 2192 回の呼び出しがあり、ソリューション 5 には 2192 回の呼び出しがあるため、過剰な DB クエリが問題であり、上記で使用した方法では何も起こらなかったexecute_sqlと思います。django-mptt issue #88: Allow select_related in model methodsselect_relatedを見ると、多かれ少なかれ正しく使用していることがわかりますが、私には疑問があり、vs . は大きな違いを生む可能性があります。get_childrenget_descendants

これcopy.deepcopyは、直接呼び出していないため不可解であり、MPTT コードから呼び出されているようには見えません。tree.py とは?

プロファイリングで多くの作業を行っている場合は、非常に洗練されたツールRunSnakeRunを強くお勧めします。これにより、プロファイルデータを非常に便利なグリッド形式で表示し、データをより迅速に理解できます。

とにかく、DB 側を合理化するためのもう 1 つの試みを次に示します。

import weakref
obj_cache = weakref.WeakValueDictionary()

def serializable_object(node):
    root_obj = {'name': node.get_wbs_code(), 'wbsCode': node.get_wbs_code(),
            'id': node.pk, 'level': node.level, 'position': node.position,
            'children': []}
    obj_cache[node.pk] = root_obj
    # don't know if the following .select_related() does anything...
    for descendant in node.get_descendants().select_related():
        # get_descendants supposedly traverses in "tree order", which I think
        # means the parent obj will always be created already
        parent_obj = obj_cache[descendant.parent.pk]    # hope parent is cached
        descendant_obj = {'name': descendant.get_wbs_code(),
            'wbsCode': descendant.get_wbs_code(), 'id': descendant.pk,
            'level': descendant.level, 'position': descendant.position,
            'children': []}
        parent_obj['children'].append(descendant_obj)
        obj_cache[descendant.pk] = descendant_obj
    return root_obj

これはもはや再帰的ではないことに注意してください。理論的には、親が訪問された後、ノードを介して反復的に進行し、すべてへの 1 つの大きな呼び出しを使用しているMPTTModel.get_descendants()ため、うまく最適化され、 caches.parentなどになります (または、親キーを取得するためのより直接的な方法があるかもしれません)。 . 最初に子を持たない各 obj を作成し、その後、すべての値を親に「移植」します。

score 0 · Accepted Answer

データをネストされた辞書またはリストに整理してから、 jsondumpメソッドを呼び出します。

import json   
data = ['foo', {'bar': ('baz', None, 1.0, 2)}]
json.dump(data)

python - mpttを使用してPython/Djangoのツリー構造を反映するJSONを作成する最速の方法

4 に答える 4

Related

Reference