python - スクレイピーリクエストクラスの使い方

Question

スクレイピーリクエストクラスを使用してリクエストを行うのを手伝ってくれる人もいます

私はこれを試しましたが、うまくいきません:

from scrapy.selector import HtmlXPathSelector
from scrapy.http.request import Request
url = 'http://www.fetise.com'
a = Request(url)
hxs = HtmlXPathSelector(a)

エラーは:

Traceback (most recent call last):
 File "sa.py", line 83, in <module>
 hxs = HtmlXPathSelector(a)
 File "/usr/local/lib/python2.7/dist-packages/scrapy/selector/lxmlsel.py",line 31,in __init__
_root = LxmlDocument(response, self._parser)
 File "/usr/local/lib/python2.7/dist-packages/scrapy/selector/lxmldocument.py",line 27,in __new__
cache[parser] = _factory(response, parser)
File "/usr/local/lib/python2.7/dist-packages/scrapy/selector/lxmldocument.py",line 13, in _factory
 body = response.body_as_unicode().strip().encode('utf8') or '<html/>'AttributeError: 'Request' object has no attribute 'body_as_unicode'`

私はコールバックについて知っています..実際には、最初にサイトからURLをスクラップしてから、それらを開始URLとして使用したい....

score 1 · Accepted Answer

これを試してください：

import urllib
from scrapy.selector import HtmlXPathSelector
from pprint import pprint

url = 'http://www.fetise.com'
data = urllib.urlopen(url).read()
hxs = HtmlXPathSelector(text=data)

lista = hxs.select('//ul[@class="categoryMenu"]/li/ul/li/a/@href').extract()

acb = ["http://www.fetise.com/" + i if "http://www.fetise.com/" not in i else i for i in lista] + [u"http://www.fetise.com/sale"]

pprint(acb)

そして、これは出力です：

[u'http://www.fetise.com/apparel/shirts',
 u'http://www.fetise.com/apparel/tees',
 u'http://www.fetise.com/apparel/tops-and-tees',
 u'http://www.fetise.com/accessories/belts',
 u'http://www.fetise.com/accessories/cufflinks',
 u'http://www.fetise.com/accessories/jewellery',
 u'http://www.fetise.com/accessories/lighters',
 u'http://www.fetise.com/accessories/others',
 u'http://www.fetise.com/accessories/sunglasses',
 u'http://www.fetise.com/accessories/ties-cufflinks',
 u'http://www.fetise.com/accessories/wallets',
 u'http://www.fetise.com/accessories/watches',
 u'http://www.fetise.com/footwear/boots',
 u'http://www.fetise.com/footwear/casual',
 u'http://www.fetise.com/footwear/flats',
 u'http://www.fetise.com/footwear/heels',
 u'http://www.fetise.com/footwear/loafers',
 u'http://www.fetise.com/footwear/sandals',
 u'http://www.fetise.com/footwear/shoes',
 u'http://www.fetise.com/footwear/slippers',
 u'http://www.fetise.com/footwear/sports',
 u'http://www.fetise.com/innerwear/boxers',
 u'http://www.fetise.com/innerwear/briefs',
 u'http://www.fetise.com/personal-care/deos',
 u'http://www.fetise.com/personal-care/haircare',
 u'http://www.fetise.com/personal-care/perfumes',
 u'http://www.fetise.com/personal-care/personal-care',
 u'http://www.fetise.com/personal-care/shavers',
 u'http://www.fetise.com/apparel/tees/gifts-for-her',
 u'http://www.fetise.com/footwear/sandals/gifts-for-her',
 u'http://www.fetise.com/footwear/shoes/gifts-for-her',
 u'http://www.fetise.com/footwear/heels/gifts-for-her',
 u'http://www.fetise.com/footwear/flats/gifts-for-her',
 u'http://www.fetise.com/footwear/ballerinas/gifts-for-her',
 u'http://www.fetise.com/footwear/loafers/gifts-for-her',
 u'http://www.fetise.com/sale']

score 0 · Accepted Answer

ドキュメントは、リクエストが完了したときにコールバックを渡す必要があることを示唆しています。コールバックは、応答オブジェクトにアクセスできます。

ドキュメントから：

追加データをコールバック関数に渡す¶ リクエストのコールバックは、そのリクエストのレスポンスがダウンロードされるときに呼び出される関数です。コールバック関数は、ダウンロードされた Response オブジェクトを最初の引数として呼び出されます。

例：

def parse_page1(self, response):
    return Request("http://www.example.com/some_page.html",
                      callback=self.parse_page2)

def parse_page2(self, response):
    # this would log http://www.example.com/some_page.html
    self.log("Visited %s" % response.url)

python - スクレイピーリクエストクラスの使い方

2 に答える 2

Related

Reference