python - Python XML BeautifulSoup は子ノードのテキストを取得します

Question

以前のプロジェクトで XML タグ属性からデータをかき集めましたが、子 XML ノードのテキストを取得する方法がわかりません。このプログラムは、テキストファイルから ID を取得し、それらを URL にプラグインして解析します。XML は次のとおりです。

<Article>
    <Sometag Owner="Steve" Status="online">
        <ID Version="1">231119634</PMID>
        <DateCreated>
            <Year>2012</Year>
            <Month>10</Month>
            <Day>10</Day>
        </DateCreated>

の子タグからyear monthとテキストを取得したいdayDateCreated

これまでのところ、私は次のことをしていますが、運はありません

    link = "http://somelink.com/"+line.rstrip('\n')+"?id=xml&format=text"
    args = (curlLink + ' -L ' + link + ' -o c:\\temp.txt --proxy-ntlm -x http://myproxy:80 -k -U:') 
    sp = subprocess.Popen(args) #run curl
    sp.wait() #Wait for it to finish before proceeding
    xml_string = open(r'C:\temp.txt', 'r').read() #read in the temporary file
    os.remove(r'C:\temp.txt') # clean up
    soup = BeautifulSoup(xml_string)
    result = soup.find('DateCreated')
    if result is not None:
        date = result.children.get_text()
        g.write(date +"\n")

score 3 · Accepted Answer

データから情報を取得するには、いくつかの方法があります。

year = int(date.Year.text)
month = int(date.Month.text)
day = int(date.Day.text)

またはdate.text、テキストの内容を文字列として提供します。何を使用すべきかは、本当に必要なものによって異なります。

python - Python XML BeautifulSoup は子ノードのテキストを取得します

1 に答える 1

Related

Reference