python - xml.etree.ElementTree を使用した Python での XML 解析の問題

Question

私はいくつかのhttp応答によって生成された次のxmlを持っています

<?xml version="1.0" encoding="UTF-8"?>
<Response rid="1000" status="succeeded" moreData="false">
  <Results completed="true" total="25" matched="5" processed="25">
      <Resource type="h" DisplayName="Host" name="tango">
          <Time start="2011/12/16/18/46/00" end="2011/12/16/19/46/00"/>
             <PerfData attrId="cpuUsage" attrName="Usage">
                <Data intr="5" start="2011/12/16/19" end="2011/12/16/19" data="36.00"/>
                <Data intr="5" start="2011/12/16/19" end="2011/12/16/19" data="86.00"/>
                <Data intr="5" start="2011/12/16/19" end="2011/12/16/19" data="29.00"/>
             </PerfData>
          <Resource type="vm" DisplayName="VM" name="charlie" baseHost="tango">
              <Time start="2011/12/16/18/46/00" end="2011/12/16/19/46/00"/>
              <PerfData attrId="cpuUsage" attrName="Usage">
                 <Data intr="5" start="2011/12/16/19" end="2011/12/16/19" data="6.00"/>
              </PerfData>
          </Resource>
      </Resource>
  </Result>
</Response>

これを注意深く見ると、Outer の内側に同じタグがもう 1 つある

したがって、高レベルのxml構造は次のとおりです

<Resource>
    <Resource>
    </Resource>
</Resource>

Python ElementTree は外側の xml のみを解析できます...以下は私のコードです

pattern = re.compile(r'(<Response.*?</Response>)',
                     re.VERBOSE | re.MULTILINE)

for match in pattern.finditer(data):
    contents = match.group(1)
    responses = xml.fromstring(contents)

    for results in responses:
        result = results.tag

        for resources in results:
            resource = resources.tag
            temp = {}
            temp = resources.attrib
            print temp

これは、次の出力を示しています (temp)

{'typeDisplayName': 'Host', 'type': 'h', 'name': 'tango'}

内部属性を取得するにはどうすればよいですか?

score 2 · Accepted Answer

xml を正規表現で解析しないでください! それは機能しません。代わりに、lxml などの xml 解析ライブラリを使用してください。

編集: コード例はトップリソースのみをフェッチし、それらをループして「サブリソース」をフェッチしようとします。これは、コメントで OP リクエストの後に行われました。

from lxml import etree

content = '''
YOUR XML HERE
'''

root = etree.fromstring(content)

# search for all "top level" resources
resources = root.xpath("//Resource[not(ancestor::Resource)]")
for resource in resources:
    # copy resource attributes in a dict
    mashup = dict(resource.attrib)
    # find child resource elements
    subresources = resource.xpath("./Resource")
    # if we find only one resource, add it to the mashup
    if len(subresources) == 1:
        mashup['resource'] = dict(subresources[0].attrib)
    # else... not idea what the OP wants...

    print mashup

それは出力されます：

{'resource': {'DisplayName': 'VM', 'type': 'vm', 'name': 'charlie', 'baseHost': 'tango'}, 'DisplayName': 'Host', 'type': 'h', 'name': 'tango'}

python - xml.etree.ElementTree を使用した Python での XML 解析の問題

1 に答える 1

Related

Reference