python - Python minidom を使用して XML を読み取り、各ノードを反復処理する

Question

次のような XML 構造がありますが、はるかに大規模です。

<root>
    <conference name='1'>
        <author>
            Bob
        </author>
        <author>
            Nigel
        </author>
    </conference>
    <conference name='2'>
        <author>
            Alice
        </author>
        <author>
            Mary
        </author>
    </conference>
</root>

このために、次のコードを使用しました。

dom = parse(filepath)
conference=dom.getElementsByTagName('conference')
for node in conference:
    conf_name=node.getAttribute('name')
    print conf_name
    alist=node.getElementsByTagName('author')
    for a in alist:
        authortext= a.nodeValue
        print authortext

ただし、出力される著者テキストは「なし」です。以下のようなバリエーションをいじってみましたが、プログラムが壊れてしまいます。

authortext=a[0].nodeValue

正しい出力は次のようになります。

1
Bob
Nigel
2
Alice
Mary

しかし、私が得るものは次のとおりです。

1
None
None
2
None
None

この問題に取り組む方法について何か提案はありますか?

score 24 · Accepted Answer

yourauthortextはタイプ 1 ( ) です。通常、文字列を取得するELEMENT_NODE必要があります。TEXT_NODEこれはうまくいきます

a.childNodes[0].nodeValue

score 6 · Accepted Answer

要素ノードには nodeValue がありません。その中の Text ノードを見る必要があります。内部に常に 1 つのテキストノードがあることがわかっている場合は、次のように言うことができますelement.firstChild.data(データはテキストノードの nodeValue と同じです)。

注意: テキストコンテンツがない場合、子 Text ノードは存在せずelement.firstChild、null になるため、.dataアクセスが失敗します。

直接の子テキストノードのコンテンツを取得する簡単な方法:

text= ''.join(child.data for child in element.childNodes if child.nodeType==child.TEXT_NODE)

DOM Level 3 CoretextContentでは、Element 内から再帰的にテキストを取得するために使用できるプロパティを取得しますが、minidom はこれをサポートしていません (他の Python DOM 実装ではサポートしています)。

score 2 · Accepted Answer

著者ごとに常に 1 つのテキストデータ値があるため、 element.firstChild.data を使用できます。

dom = parseString(document)
conferences = dom.getElementsByTagName("conference")

# Each conference here is a node
for conference in conferences:
    conference_name = conference.getAttribute("name")
    print 
    print conference_name.upper() + " - "

    authors = conference.getElementsByTagName("author")
    for author in authors:
        print "  ", author.firstChild.data
    # for

    print

score 2 · Accepted Answer

2

迅速なアクセス：

node.getElementsByTagName('author')[0].childNodes[0].nodeValue

于 2013-09-06T15:46:15.993 に答える

score 0 · Accepted Answer

私はそれを少しいじりました、そしてこれが私が仕事を得たものです：

# ...
authortext= a.childNodes[0].nodeValue
print authortext

次の出力につながります。

C:\temp\py>xml2.py
1
ボブ
ナイジェル
2
アリス
メアリー

内部テキストを取得するために childNode にアクセスする必要がある理由を正確に説明することはできませんが、少なくともそれがあなたが探していたものです。

python - Python minidom を使用して XML を読み取り、各ノードを反復処理する

5 に答える 5

Related

Reference