python - xml.etree.ElementTree ノードの深さを取得

Question

XML:

<?xml version="1.0"?>
<pages>
    <page>
        <url>http://example.com/Labs</url>
        <title>Labs</title>
        <subpages>
            <page>
                <url>http://example.com/Labs/Email</url>
                <title>Email</title>
                <subpages>
                    <page/>
                    <url>http://example.com/Labs/Email/How_to</url>
                    <title>How-To</title>
                </subpages>
            </page>
            <page>
                <url>http://example.com/Labs/Social</url>
                <title>Social</title>
            </page>
        </subpages>
    </page>
    <page>
        <url>http://example.com/Tests</url>
        <title>Tests</title>
        <subpages>
            <page>
                <url>http://example.com/Tests/Email</url>
                <title>Email</title>
                <subpages>
                    <page/>
                    <url>http://example.com/Tests/Email/How_to</url>
                    <title>How-To</title>
                </subpages>
            </page>
            <page>
                <url>http://example.com/Tests/Social</url>
                <title>Social</title>
            </page>
        </subpages>
    </page>
</pages>

コード：

// rexml is the XML string read from a URL
from xml.etree import ElementTree as ET
tree = ET.fromstring(rexml)
for node in tree.iter('page'):
    for url in node.iterfind('url'):
        print url.text
    for title in node.iterfind('title'):
        print title.text.encode("utf-8")
    print '-' * 30

出力：

http://example.com/article1
Article1
------------------------------
http://example.com/article1/subarticle1
SubArticle1
------------------------------
http://example.com/article2
Article2
------------------------------
http://example.com/article3
Article3
------------------------------

Xml は、サイトマップの構造のようなツリーを表します。

私は一日中ドキュメントと Google を行ったり来たりしていましたが、エントリのノードの深さを取得するのが難しいことがわかりません。

私は子コンテナのカウントを使用しましたが、それは最初の親に対してのみ機能し、リセット方法がわからないため壊れます。しかし、これはおそらくハック的なアイデアです。

望ましい出力:

0
http://example.com/article1
Article1
------------------------------
1
http://example.com/article1/subarticle1
SubArticle1
------------------------------
0
http://example.com/article2
Article2
------------------------------
0
http://example.com/article3
Article3
------------------------------

score 10 · Accepted Answer

Python ElementTreeAPI は、XML ツリーの深さ優先トラバーサル用の反復子を提供します。残念ながら、これらの反復子は呼び出し元に深さ情報を提供しません。

ただし、各要素の深さ情報も返す深さ優先イテレータを作成できます。

import xml.etree.ElementTree as ET

def depth_iter(element, tag=None):
    stack = []
    stack.append(iter([element]))
    while stack:
        e = next(stack[-1], None)
        if e == None:
            stack.pop()
        else:
            stack.append(iter(e))
            if tag == None or e.tag == tag:
                yield (e, len(stack) - 1)

これは、親リンクをたどって深さを決定するよりも効率的であることに注意してください ( を使用する場合lxml) 。O(n)O(n log n)

score 4 · Accepted Answer

使用済みlxml.html。

import lxml.html

rexml = ...

def depth(node):
    d = 0
    while node is not None:
        d += 1
        node = node.getparent()
    return d

tree = lxml.html.fromstring(rexml)
for node in tree.iter('page'):
    print depth(node)
    for url in node.iterfind('url'):
        print url.text
    for title in node.iterfind('title'):
        print title.text.encode("utf-8")
    print '-' * 30

python - xml.etree.ElementTree ノードの深さを取得

6 に答える 6

Related

Reference