python - ElementTree での XPath の使用

Question

私の XML ファイルは次のようになります。

<?xml version="1.0"?>
<ItemSearchResponse xmlns="http://webservices.amazon.com/AWSECommerceService/2008-08-19">
  <Items>
    <Item>
      <ItemAttributes>
        <ListPrice>
          <Amount>2260</Amount>
        </ListPrice>
      </ItemAttributes>
      <Offers>
        <Offer>
          <OfferListing>
            <Price>
              <Amount>1853</Amount>
            </Price>
          </OfferListing>
        </Offer>
      </Offers>
    </Item>
  </Items>
</ItemSearchResponse>

私がしたいのは、ListPrice を抽出することだけです。

これは私が使用しているコードです:

>> from elementtree import ElementTree as ET
>> fp = open("output.xml","r")
>> element = ET.parse(fp).getroot()
>> e = element.findall('ItemSearchResponse/Items/Item/ItemAttributes/ListPrice/Amount')
>> for i in e:
>>    print i.text
>>
>> e
>>

絶対にアウトプットしない。私も試しました

>> e = element.findall('Items/Item/ItemAttributes/ListPrice/Amount')

変わりはない。

私は何を間違っていますか？

score 69 · Accepted Answer

あなたが持っている2つの問題があります。

1)elementドキュメント全体を再帰的にではなく、ルート要素のみを含みます。ElementTree ではなく Element 型です。

2) XML に名前空間を保持する場合、検索文字列は名前空間を使用する必要があります。

問題＃1を修正するには：

変更する必要があります:

element = ET.parse(fp).getroot()

に：

element = ET.parse(fp)

問題 2 を修正するには:

XML ドキュメントから xmlns を削除すると、次のようになります。

<?xml version="1.0"?>
<ItemSearchResponse>
  <Items>
    <Item>
      <ItemAttributes>
        <ListPrice>
          <Amount>2260</Amount>
        </ListPrice>
      </ItemAttributes>
      <Offers>
        <Offer>
          <OfferListing>
            <Price>
              <Amount>1853</Amount>
            </Price>
          </OfferListing>
        </Offer>
      </Offers>
    </Item>
  </Items>
</ItemSearchResponse>

このドキュメントでは、次の検索文字列を使用できます。

e = element.findall('Items/Item/ItemAttributes/ListPrice/Amount')

完全なコード:

from elementtree import ElementTree as ET
fp = open("output.xml","r")
element = ET.parse(fp)
e = element.findall('Items/Item/ItemAttributes/ListPrice/Amount')
for i in e:
  print i.text

問題＃2の代替修正：

それ以外の場合は、各要素の検索文字列内に xmlns を指定する必要があります。

完全なコード:

from elementtree import ElementTree as ET
fp = open("output.xml","r")
element = ET.parse(fp)

namespace = "{http://webservices.amazon.com/AWSECommerceService/2008-08-19}"
e = element.findall('{0}Items/{0}Item/{0}ItemAttributes/{0}ListPrice/{0}Amount'.format(namespace))
for i in e:
    print i.text

両方とも印刷:

2260

score 8 · Accepted Answer

from xml.etree import ElementTree as ET
tree = ET.parse("output.xml")
namespace = tree.getroot().tag[1:].split("}")[0]
amount = tree.find(".//{%s}Amount" % namespace).text

また、lxmlの使用を検討してください。それはずっと速いです。

from lxml import ElementTree as ET

score 7 · Accepted Answer

要素ツリーは名前空間を使用するため、xml 内のすべての要素は { http://webservices.amazon.com/AWSECommerceService/2008-08-19 }Itemsのような名前になります

そのため、検索に名前空間を含めます。

search = '{http://webservices.amazon.com/AWSECommerceService/2008-08-19}Items/{http://webservices.amazon.com/AWSECommerceService/2008-08-19}Item/{http://webservices.amazon.com/AWSECommerceService/2008-08-19}ItemAttributes/{http://webservices.amazon.com/AWSECommerceService/2008-08-19}ListPrice/{http://webservices.amazon.com/AWSECommerceService/2008-08-19}Amount'
element.findall( search )

2260 に対応する要素を与える

score 6 · Accepted Answer

私はそのように生のxmlからxmlnsを取り除くことになりました:

def strip_ns(xml_string):
    return re.sub('xmlns="[^"]+"', '', xml_string)

明らかにこれには非常に注意が必要ですが、私にとってはうまくいきました。

python - ElementTree での XPath の使用

5 に答える 5

Related

Reference