python - Python で XPath を使用して属性の値を抽出する

Question

私はHTMLを持っています:

<table>
<tbody>
<tr>
<td align="left" valign="top" style="padding: 0 10px 0 60px;">
<img src="/files/39.jpg" width="64" height="64">
</td>
<td align="left" valign="middle"><h1>30 Rock</h1></td>
</tr>
</tbody>
</table>

Python と LXML を使用してsrc、<img>要素の属性から値を抽出する必要があります。これが私が試したことです：

import lxml.html
import urllib

# make HTTP request to site
page = urllib.urlopen("http://my.url.com")
# read the downloaded page
doc = lxml.html.document_fromstring(page.read())

txt1 = doc.xpath('/html/body/table[2]/tbody/tr/td[1]/img')

印刷するtxt1と、空のリストのみが表示され[]ます。どうすればこれを修正できますか?

score 3 · Accepted Answer

次の XPath を使用します。

//img/@src

コンテキストノードsrcのすべての子孫要素の属性を抽出します。img

python - Python で XPath を使用して属性の値を抽出する

1 に答える 1

Related

Reference