python - xml.dom.minidom を使用した Python の XML 解析 - リスト内の項目の抽出

Question

私は長い xml を持っています。これは実際には ebay api を使用した ebay リストです。その xml dom で次の構造を抽出しようとしています。

問題のあるセグメントのみを掲載しています。ファイル全体を表示する必要がある場合はお知らせください。ファイルを別の場所にアップロードするか、画像として添付できます。

<ItemSpecifics>
<NameValueList>
<Name>Room</Name>
<Value>Living Room</Value>
</NameValueList>
<NameValueList>
<Name>Type</Name>
<Value>Sofa Set</Value>
</NameValueList>
<NameValueList>...</NameValueList>
<NameValueList>
<Name>Upholstery Fabric</Name>
<Value>Microfiber</Value>
</NameValueList>
<NameValueList>
<Name>Color</Name>
<Value>Beiges</Value>
</NameValueList>
<NameValueList>
<Name>Style</Name>
<Value>Contemporary</Value>
</NameValueList>
<NameValueList>
<Name>MPN</Name>
<Value>F7615, F7616, F7617, F7618, F7619, F7620</Value>
</NameValueList>
</ItemSpecifics>

別の ebay アイテムの dom 構造は次のとおりです。

ItemSpecifics>
<NameValueList>
<Name>Brand</Name>
<Value>Nikon</Value>
</NameValueList>
<NameValueList>
<Name>Model</Name>
<Value>D3100</Value>
</NameValueList>
<NameValueList>
<Name>MPN</Name>
<Value>9798</Value>
</NameValueList>
<NameValueList>
<Name>Type</Name>
<Value>Digital SLR</Value>
</NameValueList>
<NameValueList>
<Name>Megapixels</Name>
<Value>14.2 MP</Value>
</NameValueList>
<NameValueList>
<Name>Optical Zoom</Name>
<Value>3.1x</Value>
</NameValueList>
<NameValueList>
<Name>Screen Size</Name>
<Value>3"</Value>
</NameValueList>
<NameValueList>
<Name>Color</Name>
<Value>Black</Value>
</NameValueList>
</ItemSpecifics>

しかし、上記の要素を抽出しようとすると、次のエラーが発生します。

   attID=att.attributes.getNamedItem('Name').nodeValue
AttributeError: 'NoneType' object has no attribute 'nodeValue'

これは、応答を解析した直後に得られるものです。

[<DOM Element: NameValueList at 0x103398878>, <DOM Element: NameValueList at 0x103398ab8>, <DOM Element: NameValueList at 0x103398cf8>, <DOM Element: NameValueList at 0x103398f38>, <DOM Element: NameValueList at 0x1033b31b8>, <DOM Element: NameValueList at 0x1033b33f8>, <DOM Element: NameValueList at 0x1033b3638>, <DOM Element: NameValueList at 0x1033b3878>]

これは、エラーが発生する前に for ループ内で取得したものです。

<DOM Element: NameValueList at 0x103398878>

これが私のコードです：

  results = {}
  attributeSet=response.getElementsByTagName('NameValueList')
  print attributeSet
  attributes={}
  for att in attributeSet:
    print att
    attID=att.attributes.getNamedItem('Name').nodeValue
    attValue=getSingleValue(att,'Value')
    attributes[attID]=attValue
  result['attributes']=attributes
  return result

これは私のxmlリクエストメソッドです:

def sendRequest(apicall,xmlparameters):
  connection = httplib.HTTPSConnection(serverUrl)
  connection.request("POST", '/ws/api.dll', xmlparameters, getHeaders(apicall))
  response = connection.getresponse()
  if response.status != 200:
    print "Error sending request:" + response.reason
  else: 
    data = response.read()
    connection.close()
  return data

score 3 · Accepted Answer

attributes.getNamedItem()子ではなく要素の属性を提供し、<NameValueList>要素には属性がなくName、要素のみがあり<Name>ます。の含まれている要素をループする<NameValueList>か、とを使用.getElementsByTagName('Name')し.getElementsByTagName('Value')て個々のサブノードを取得する必要があります。

ただし、代わりにElementTree APIを使用してください。その API は、XML DOM API よりもはるかに Pythononic であり、使いやすいです。

from xml.etree import ElementTree as ET

etree = ET.fromstring(data)
results = {}
for nvl in etree.findall('NameValueList'):
    name = nvl.find('Name').text
    value = nvl.find('Value').text
    results[name] = value

python - xml.dom.minidom を使用した Python の XML 解析 - リスト内の項目の抽出

1 に答える 1

Related

Reference