1

私はスクレイピーのチュートリアルに従おうとしていますが、オンライン チュートリアルで使用されている dmoz ではなく、例として amazon を使用しています。

次のコマンドを実行すると、301 リダイレクトが発生することに気付きました。

scrapy shell "https://www.amazon.com/gp/product/B00KLTPUU0"

2016-07-31 23:36:13 [scrapy] INFO: Scrapy 1.1.1 started (bot: scrapybot)
2016-07-31 23:36:13 [scrapy] INFO: Overridden settings: {'LOGSTATS_INTERVAL': 0, 'DUPEFILTER_CLASS': 'scrapy.dupefilters.BaseDupeFilter'}
2016-07-31 23:36:13 [scrapy] INFO: Enabled extensions:['scrapy.extensions.telnet.TelnetConsole','scrapy.extensions.corestats.CoreStats']
2016-07-31 23:36:13 [scrapy] INFO: Enabled downloader middlewares:['scrapy.downloadermiddlewares.httpauth.HttpAuthMiddleware','scrapy.downloadermiddlewares.downloadtimeout.DownloadTimeoutMiddleware','scrapy.downloadermiddlewares.useragent.UserAgentMiddleware','scrapy.downloadermiddlewares.retry.RetryMiddleware','scrapy.downloadermiddlewares.defaultheaders.DefaultHeadersMiddleware','scrapy.downloadermiddlewares.redirect.MetaRefreshMiddleware','scrapy.downloadermiddlewares.httpcompression.HttpCompressionMiddleware','scrapy.downloadermiddlewares.redirect.RedirectMiddleware','scrapy.downloadermiddlewares.cookies.CookiesMiddleware','scrapy.downloadermiddlewares.chunked.ChunkedTransferMiddleware','scrapy.downloadermiddlewares.stats.DownloaderStats']
2016-07-31 23:36:13 [scrapy] INFO: Enabled spider middlewares:['scrapy.spidermiddlewares.httperror.HttpErrorMiddleware','scrapy.spidermiddlewares.offsite.OffsiteMiddleware','scrapy.spidermiddlewares.referer.RefererMiddleware','scrapy.spidermiddlewares.urllength.UrlLengthMiddleware','scrapy.spidermiddlewares.depth.DepthMiddleware']
2016-07-31 23:36:13 [scrapy] INFO: Enabled item pipelines:[]
2016-07-31 23:36:13 [scrapy] DEBUG: Telnet console listening on 127.0.0.1:6023
2016-07-31 23:36:13 [scrapy] INFO: Spider opened
2016-07-31 23:36:17 [scrapy] DEBUG: Crawled (301) <GET https://www.amazon.com/gp/product/B00KLTPUU0> (referer: None)
2016-07-31 23:36:17 [root] DEBUG: Using default logger
2016-07-31 23:36:17 [root] DEBUG: Using default logger
[s] Available Scrapy objects:
[s]   crawler    <scrapy.crawler.Crawler object at 0x7fb02b791f50>
[s]   item       {}
[s]   request    <GET https://www.amazon.com/gp/product/B00KLTPUU0>
[s]   response   <301 https://www.amazon.com/gp/product/B00KLTPUU0>
[s]   settings   <scrapy.settings.Settings object at 0x7fb02b7919d0>
[s]   spider     <DefaultSpider 'default' at 0x7fb02b117510>
[s] Useful shortcuts:
[s]   shelp()           Shell help (print this help)
[s]   fetch(req_or_url) Fetch request (or URL) and update local objects
[s]   view(response)    View response in a browser

スクレイピーを使用してこのAmazonページの301リダイレクトを取得する理由を誰か説明できますか?

4

0 に答える 0