python - lxmlXpathパーサーを使用してhtmlを解析できません

Question

私はこのページからレビューを解析しようとしています：http：//www.amazon.co.uk/product-reviews/B00143ZBHY

次のアプローチを使用します。

コード

html # a variable which contains exact html as given at the above page.
from lxml import etree
tree = etree.HTML(html)
r = tree.xpath(".//*[@id='productReviews']/tbody/tr/td[1]/div[9]/text()[4]")
print len(r)
print r[0].tag

出力

0
Traceback (most recent call last):
  File "c.py", line 37, in <module>
    print r[0].tag
IndexError: list index out of range

p、s ,: firefoxのxpathチェッカーアドオンで同じxpathを使用している間、私はそれを簡単に行うことができます。しかし、ここでは結果はありません、助けてください！

score 7 · Accepted Answer

/tbodyフォームXPathを削除してみてください—<tbody>にはありません#productReviews。

import urllib2
html = urllib2.urlopen("http://www.amazon.co.uk/product-reviews/B00143ZBHY").read()
from lxml import etree
tree = etree.HTML(html)
r = tree.xpath(".//*[@id='productReviews']/tr/td[1]/div[9]/text()[4]")
print r[0]

出力：

bought this as replacement for the original cover which came with my greenhouse and which ripped in the wind.  so far this seems a good replacement although for some reason it seems slightly too small for my greenhouse so that i cant zip both sides of the front at the same time.  seems sturdier and thicker than the cover i had before so hoping it lasts a bit longer!

python - lxmlXpathパーサーを使用してhtmlを解析できません

1 に答える 1

Related

Reference