python - スクリプトがタグ内のコンテンツにアクセスできない

Question

大きな xml ファイルを解析しようとしています。

以下の構造になっています。

    < merchandiser >
    <header></header>
    <product>
    <name></name>
    <URL>
        <info>
        </info>
        <product>
       </product>
    </URL>
    </product>

    ............

    <product>
    <name></name>
    <URL>
        <info>
        </info>
        <product>
       </product>

    </URL>
    </product>
    </merchandiser>

python-lxml ライブラリの iter.parse() を使用しています。

    for event , element in etree.iterparse(xmlfile,tag='product'):

        if element.tag=="product" and event == "end":
            if element.findall("..")[0].tag=='merchandiser':
                        print element.xpath('./URL/product/text()')
                        print element.xpath('./URL/info/text()')
        element.clear()

スクリプトはタグ内のテキストを出力しますが、タグ内のテキストの出力に失敗します。

タグ名が同じだからだと思います。

私が間違っていることを教えてください。

score 1 · Accepted Answer

for ループはすべてのproduct要素を反復処理してそれらを呼び出しclear()、すべてのテキストとサブ要素を削除します。end外側の要素のイベントで印刷しているため、印刷する前productに内側の要素のテキストを削除していますproduct。

score 0 · Accepted Answer

この XPath 式:は、タグ内にあるタグ./URL/product/text()内のテキストを検索しますが、タグ内にあるproductタグ内にあるタグは検索しURLません。productproductURL

./URL/product/product/text()or の使用も検討してください//product/text()。

python - スクリプトがタグ内のコンテンツにアクセスできない

2 に答える 2

Related

Reference