python - Pythonでxmlノードのリンクデータにアクセスするにはどうすればよいですか?

Question

次の種類のデータが xml 形式で返されます (多くの部屋が返されます。これは返されるデータの一例です)。

<?xml version="1.0" encoding="UTF-8"?>
<rooms>
    <total-results>1</total-results>
    <items-per-page>1</items-per-page>
    <start-index>0</start-index>
    <room>
        <id>xxxxxxxx</id>
        <etag>5</etag>
        <link rel="http://schemas.com.mysite.building" title="building" href="https://mysite.me.myschool.edu:8443/ess/scheduleapi/v1/buildings/yyyyyyyyy"/>
        <name>1.306</name>
        <status>active</status>
        <link rel="self" title="self" href="https://mysite.me.myschool.edu:8443/ess/scheduleapi/v1/rooms/aaaaaaaaa">
    </room>
</rooms>

nodeType == node.TEXT_NODE の場合、データにアクセスできるようです (したがって、1.306 の部屋があることがわかります)。また、nodeNameリンクにアクセスできるようですが、その部屋が許容できる建物の 1 つにあるかどうかを本当に知る必要があるため、yyyyyyyyy を確認するには、その行の残りの部分にアクセスできる必要があります。誰かアドバイスしてもらえますか？

OK、@vezult、これが、あなたが提案したように、ElementTreeを使用して最終的に思いついたものです(作業コード!)。これはおそらく、これを行うための最も Pythonic (または ElementTree-ic?) な方法ではありませんが、うまくいくようです。xml のすべての部分の .tag、.attrib、および .text にアクセスできるようになったことに興奮しています。より良いものにするためのアドバイスをお待ちしております。

# We start out knowing our room name and our building id.  However, the same room can exist in many buildings.
# Examine the rooms we've received and get the id of the one with our name that is also in our building.

# Query the API for a list of rooms, getting u back.

request = build_request(resourceUrl)
u = urllib2.urlopen(request.to_url())
mydata = u.read()

root = ElementTree.fromstring(mydata)
print 'tree root', root.tag, root.attrib, root.text
for child in root:
    if child.tag == 'room':   
        for child2 in child:
            # the id tag comes before the name tag, so hold on to it
            if child2.tag == "id":
                hold_id = child2.text
            # the building link comes before the room name, so hold on to it
            if child2.tag == 'link':                            # if this is a link
                if "building" in child2.attrib['href']:         # and it's a building link
                    hold_link_data = child2.attrib['href']
            if child2.tag == 'name':
                if (out_bldg in hold_link_data and  # the building link we're looking at has our building in it  
                    (in_rm == child2.text)):        # and this room name is our room name
                    out_rm = hold_id
                    break  # get out of for-loop

score 3 · Accepted Answer

使用しているライブラリがわからないため、標準のpythonElementTreeモジュールを使用していると想定しています。その場合は、次のようにします。

from xml.etree import ElementTree

tree = ElementTree.fromstring("""<?xml version="1.0" encoding="UTF-8"?>
<rooms>
    <total-results>1</total-results>
    <items-per-page>1</items-per-page>
    <start-index>0</start-index>
    <room>
        <id>xxxxxxxx</id>
        <etag>5</etag>
        <link rel="http://schemas.com.mysite.building" title="building" href="https://mysite.me.myschool.edu:8443/ess/scheduleapi/v1/buildings/yyyyyyyyy" />
        <name>1.306</name>
        <status>active</status>
        <link rel="self" title="self" href="https://mysite.me.myschool.edu:8443/ess/scheduleapi/v1/rooms/aaaaaaaaa" />
    </room>
</rooms>
""")

# Select the first link element in the example XML
for node in tree.findall('./room/link[@title="building"]'):
    # the 'attrib' attribute is a dictionary containing the node attributes
    print node.attrib['href']

python - Pythonでxmlノードのリンクデータにアクセスするにはどうすればよいですか?

1 に答える 1

Related

Reference