python - Python：HTML生成のパフォーマンスの向上

Question

私は現在、個々のタグオブジェクトを作成することですべてのhtmlを生成するレガシーPythonアプリケーションをサポートしています。

親のTAGクラスがあります

class TAG(object):
    def __init__(self, tag="TAG", contents=None, **attributes):
        self.tag = tag
        self.contents = contents
        self.attributes = attributes

したがって、他のすべてのタグはTAGから継承します

class H1(TAG):
    def __init__(self, contents=None, **attributes):
        TAG.__init__(self, 'H1', contents, **attributes)
class H2(TAG):
    def __init__(self, contents=None, **attributes):
        TAG.__init__(self, 'H2', contents, **attributes)

メインのTAGクラスには、次のようなto_stringメソッドがあります。

def to_string(self):
    yield '<{}'.format(self.tag)
    for (a, v) in self.attr_g():
        yield ' {}="{}"'.format(a, v)
    if self.NO_CONTENTS:
        yield '/>'
    else :
        yield '>'
        for c in self.contents:
            if isinstance(c, TAG):
                for i in c.str_g():
                    yield i
            else:
                yield c
        yield '</{}>'.format(self.tag)

基本的にto_stringメソッドの結果を書き出します。

この問題は、多くのTAGが生成されており、パフォーマンスに影響を与えるのに十分な大きさのページに発生します。

パフォーマンスを向上させるために私ができるクイックウィンはありますか？

score 3 · Accepted Answer

序文：これはHTMLを生成するためのひどい方法ですが、それを実行する場合は、可能な限り最善の方法で実行することをお勧めします。

Pythonが非常に優れていることの1つは、文字列のフォーマットです。たくさんの小さな文字列を連結している場合は、最初からパフォーマンスを犠牲にしていることになります。メソッドto_string()は次のようになります。

def to_string(self):
    return """<{tag}{attributes}>{content}</{tag}>""".format(
        tag=self.tag,
        attributes=' '.join('%s="%s"' % (attr, val) for
                            attr, val in self.attributes),
        content=''.join(
            (n if isinstance(n, basestring) else n.to_string()) for
            n in self.contents))

私がそこで行ったいくつかのことに注意してください。

これはPythonであり、Javaではありません。スタックフレームは高価なので、関数とメソッドの呼び出しを最小限に抑えます。
プロパティを抽象化する関数が必要ない場合は、実行しないでください。つまり、必要はありませんattr_g（エスケープを行う場合を除きますが、代わりにデータを入力するときに行うことができます）。
すべての文字列フォーマットを同じ文字列で実行してください。小さな文字列に対して単一の文字列フォーマット操作を実行し、それを連結するために生成することは、大きな無駄です。
このためにジェネレータを使用しないでください。あなたが譲歩するたびに、あなたは命令ポインタをいじくり回している、それは本質的に物事を遅くするだろう。

その他の指針：

から継承してobjectいるので、関数を使用しsuper()ます。

タグタイプを宣言するコンストラクターを記述してコードを無駄にしないでください。

class TAG(object):
    def __init__(self, contents=None, **attributes):
        self.contents = contents
        self.attributes = attributes

class H1(TAG):
    tag = 'H1'

class H2(TAG):
    tag = 'H2'

StringIOあなたがこれをたくさんしているなら、あなたはオブジェクトでいくらかの成功を収めるかもしれません。タグと.write()それらを組み込むことができます。それらは.NetStringBufferまたはJavaと考えることができますStringBuilder。

score 1 · Accepted Answer

@mattbastaはここで正しい考えを持っています。ただし、少し異なるものを提案したいと思います。to_stringを使用して実装しcElementTree.TreeBuilderます。ElementTreeの超高速シリアル化が、ElementTreeの作成のオーバーヘッドに打ち勝つかどうかはわかりません。

これは、いくつかのマイクロ最適化を利用し、TreeBuilderを使用してツリーを構築するメソッドを持つ奇抜TAGなクラスです。to_string_b()（とTreeBuilderのおそらく重要な違いto_string()は、TreeBuilderは常にXMLの出力をエスケープしますが、あなたの出力はエスケープしないことです。）

import xml.etree.cElementTree as ET

class TAG(object):
    def __init__(self, tag="TAG", contents=None, **attributes):
        self.tag = tag
        # this is to insure that `contents` always has a uniform
        # type.
        if contents is None:
            self.contents = []
        else:
            if isinstance(contents, basestring):
                # I suspect the calling code passes in a string as contents
                # in the common case, so this means that each character of
                # the string will be yielded one-by-one. let's avoid that by
                # wrapping in a list.
                self.contents = [contents]
            else:
                self.contents = contents
        self.attributes = attributes

    def to_string(self):
        yield '<{}'.format(self.tag)
        for (a, v) in self.attributes.items():
            yield ' {}="{}"'.format(a, v)
        if self.contents is None:
            yield '/>'
        else :
            yield '>'
            for c in self.contents:
                if isinstance(c, TAG):
                    for i in c.to_string():
                        yield i
                else:
                    yield c
            yield '</{}>'.format(self.tag)

    def to_string_b(self, builder=None):
        global isinstance, basestring
        def isbasestring(c, isinstance=isinstance, basestring=basestring):
            # some inlining
            return isinstance(c, basestring)
        if builder is None:
            iamroot = True
            builder = ET.TreeBuilder()
        else:
            iamroot = False #don't close+flush the builder
        builder.start(self.tag, self.attributes)
        if self.contents is not None:
            for c in self.contents:
                if (isbasestring(c)):
                    builder.data(c)
                else:
                    for _ in c.to_string_b(builder):
                        pass
        builder.end(self.tag)
        # this is a yield *ONLY* to preserve the interface
        # of to_string()! if you can change the calling
        # code easily, use return instead!
        if iamroot:
            yield ET.tostring(builder.close())


class H1(TAG):
    def __init__(self, contents=None, **attributes):
        TAG.__init__(self, 'H1', contents, **attributes)
class H2(TAG):
    def __init__(self, contents=None, **attributes):
        TAG.__init__(self, 'H2', contents, **attributes)    

tree = H1(["This is some ", H2("test input", id="abcd", cls="efgh"), " and trailing text"])

print ''.join(tree.to_string())
print ''.join(tree.to_string_b())

python - Python：HTML生成のパフォーマンスの向上

2 に答える 2

Related

Reference