python - etreeオブジェクトのすべての要素の単純なループ?

Question

etree 要素からリストを返す関数がありますが、ネストされた要素を調べません。

<elem>
    <variable id="getthis">
        <!-- / -->
    </variable>
    <if>
        <variable id="alsoGetThis">
            <!-- Keep looping through all elements -->
        </variable>
    </if>
</elem>

(私はValid XMLで作業しています)

現在、内の変数<if>は無視されているため、ツリーのすべてのレベルをループするにはどうすればよいでしょうか? これは簡単な作業だと思いますが、間違っているかもしれません。（私はPythonが初めてで、常にプログラマーのように考えるとは限りません）

変数を取得する Python 関数:

def collect_vars(self, elem):
    elemVars = []
    if elem.tag == 'variable':
        elemVars.append(elem.attrib['id'])
    elif e in elem == 'variable': # don't want to be doing these
        elemVars.append(e.attrib['id'])
    return elemVars

したがって、最終的に必要なのはelemVars、指定された変数内のすべての変数 ID を含むリストです。<elem>

score 4 · Accepted Answer

XPathを学習しxpath、LXMLのメンバーを使用することを検討してください。t発行したかのように、XML ツリーがと呼ばれているとしましょう

>>> s = """<elem>
    <variable id="getthis">
        <!-- / -->
    </variable>
    <if>
        <variable id="alsoGetThis">
            <!-- Keep looping through all elements -->
        </variable>
    </if>
</elem>
"""
>>> t = etree.fromstring(s)

次に、ツリー内のすべての要素を見つけることができます

>>> t.xpath("//*")
[<Element elem at 0x2809b40>, <Element variable at 0x2809be0>, <Element if at 0x2809af0>, <Element variable at 0x2809c80>]

およびすべてのvariable要素

>>> t.xpath("//variable")
[<Element variable at 0x2809be0>, <Element variable at 0x2809c80>]

xpath要素ツリーとして表される、指定した XPath 条件を満たす要素のリストを返します。

>>> [x.attrib["id"] for x in t.xpath("//variable")]
['getthis', 'alsoGetThis']

score 1 · Accepted Answer

あなたが直面している問題は、ファイル内のすべてのノードにアクセスしていないことです。あなたは要素の子供たちを訪問しているだけelemですが、これらの要素の子供たちを訪問していません。これを説明するために、以下を実行します（XMLを有効になるように編集しました）。

from xml.etree.ElementTree as etree

xml_string = """<elem>
    <variable id="getthis" />
    <if>
        <variable id="alsoGetThis" />
    </if>
    </elem>"""

e = etree.fromstring(xml_string)

for node in e:
    print node

結果は

<Element variable at 7f53fbdf1cb0>
<Element if at 7f53fbdf1cf8>

variableしたがって、ノードの子を訪問していませんif。XMLファイル内の各ノードに再帰的にアクセスする必要があります。つまり、関数collect_varsはそれ自体を呼び出す必要があります。これを説明するために、少しコードを投稿します。

編集：約束通り、id要素ツリーからすべての属性を取得するためのコード。Niek de Kleinのようにアキュムレータを使用するのではなく、ジェネレータを使用しました。これには多くの利点があります。たとえば、これはid一度に1つずつ返されるため、たとえば特定の問題が発生した場合は、いつでも処理を停止できidます。これにより、XMLファイル全体の読み取りを節約できます。

def get_attrs(element, tag, attr):
    """Return attribute `attr` of `tag` child elements of `element`."""

    # If an element has any cildren (nested elements) loop through them:
    if len(element):
         for node in element:
            # Recursively call this function, yielding each result:
            for attribute in get_attrs(node, tag, attr):
                yield attribute

    # Otherwise, check if element is of type `tag` with attribute `attr`, if so
    # yield the value of that attribute.
    if element.tag == 'variable':
        if attr in element.attrib:
            yield element.attrib[attr]

ids = [id for id in get_attrs(e, 'variable', 'id')]

print ids

これにより、結果が得られます

 ['getthis', 'alsoGetThis']

python - etreeオブジェクトのすべての要素の単純なループ?

2 に答える 2

Related

Reference