python - Python/ETを使用してXMLから特定の要素を解析する

Question

私は次の行に沿ってXMLを持っています：

<?xml version="xxx"?>
<doc:document xmlns:doc="some value 1...">
    <rdf:RDF xmlns:rdf="some value 2...">
        <rdf:Description rdf:about="some value...">
            <dct:format xmlns:dct="http://someurl/">some value 3</dct:format>
            <dct:title xmlns:dct="http://someurl/">some text of interest to me</dct:title>
        </rdf:Description>
    </rdf:RDF>
</doc:document>

Python / ETreeを使用して「興味のあるテキスト」を取得するにはどうすればよいですか？

助けてくれてありがとう！

score 1 · Accepted Answer

title名前空間を指定して要素を探す必要があります。

tree.find('.//dct:title', namespaces={'dct': 'http://purl.org/dc/terms/'})

検索ごとにマッピングを渡す必要がnamespacesあるため、事前にマッピングを指定して再利用することもできます。

nsmap = {
    'dct': 'http://purl.org/dc/terms/',
    'doc': 'http://www.witbd.org/xmlns/common/document/',
    'rdf': 'http://www.w3.org/1999/02/22-rdf-syntax-ns#',
}

tree.find('.//dct:title', namespaces=nsmap)

サンプルドキュメント（名前空間が復元されている）の場合、次のようになります。

>>> tree.find('.//dct:title', namespaces=nsmap)
<Element '{http://purl.org/dc/terms/}title' at 0x105ec4690>
>>> tree.find('.//dct:title', namespaces=nsmap).text
'some text of interest to me'

XPath式で名前空間を使用することもできます。

tree.find('.//{http://purl.org/dc/terms/}title')

これはプレフィックスを使用するものであり、namespacesマップはとにかく内部的に実行します。

python - Python/ETを使用してXMLから特定の要素を解析する

1 に答える 1

Related

Reference