Scrapy で問題が発生しています。何らかの理由で解析メソッドに入っていません。その理由がわかりません。成功せずにさまざまなオプションを試しました。
これが私のコードの外観です。具体的には、2 つの print ステートメントがあり、parse() メソッド内のステートメントは呼び出されていません。
from scrapy.spider import BaseSpider
from scrapy.selector import HtmlXPathSelector
from scrapy import log
from scrapy.contrib.spiders import CrawlSpider, Rule
from scrapy.contrib.linkextractors.sgml import SgmlLinkExtractor
from comments.items import CustomerReview
import re
class AppidSpider(BaseSpider):
name = "appid"
allowed_domains = ["itunes.apple.com"]
start_urls = [
"http://itunes.apple.com/us/genre/ios/id36?mt=8"
]
rules = [Rule(SgmlLinkExtractor(), follow=True, callback='parse')]
print "---> THIS IS TEST 1"
def parse(self, response):
print " ----> THIS IS TEST 2"
# ... More code afterwards
そして、これが出力です。ご覧のとおり、TEST 2 は印刷されません。
$ scrapy crawl appid
2012-07-05 13:41:02+0000 [scrapy] INFO: Scrapy 0.14.4 started (bot: comments)
2012-07-05 13:41:02+0000 [scrapy] DEBUG: Enabled extensions: LogStats, TelnetConsole, CloseSpider, WebService, CoreStats, MemoryUsage, SpiderState
---> THIS IS TEST 1
2012-07-05 13:41:02+0000 [scrapy] DEBUG: Enabled downloader middlewares: HttpAuthMiddleware, DownloadTimeoutMiddleware, UserAgentMiddleware, RetryMiddleware, DefaultHeadersMiddleware, RedirectMiddleware, CookiesMiddleware, HttpCompressionMiddleware, ChunkedTransferMiddleware, DownloaderStats
2012-07-05 13:41:02+0000 [scrapy] DEBUG: Enabled spider middlewares: HttpErrorMiddleware, OffsiteMiddleware, RefererMiddleware, UrlLengthMiddleware, DepthMiddleware
2012-07-05 13:41:02+0000 [scrapy] DEBUG: Enabled item pipelines: FilterWordsPipeline
2012-07-05 13:41:02+0000 [appid] INFO: Spider opened
2012-07-05 13:41:02+0000 [appid] INFO: Crawled 0 pages (at 0 pages/min), scraped 0 items (at 0 items/min)
2012-07-05 13:41:02+0000 [scrapy] DEBUG: Telnet console listening on 0.0.0.0:6023
2012-07-05 13:41:02+0000 [scrapy] DEBUG: Web service listening on 0.0.0.0:6080
2012-07-05 13:41:02+0000 [appid] DEBUG: Crawled (200) <GET http://itunes.apple.com/us/genre/ios/id36?mt=8> (referer: None)
2012-07-05 13:41:02+0000 [appid] INFO: Closing spider (finished)
2012-07-05 13:41:02+0000 [appid] INFO: Dumping spider stats:
{'downloader/request_bytes': 222,
'downloader/request_count': 1,
'downloader/request_method_count/GET': 1,
'downloader/response_bytes': 9927,
'downloader/response_count': 1,
'downloader/response_status_count/200': 1,
'finish_reason': 'finished',
'finish_time': datetime.datetime(2012, 7, 5, 13, 41, 2, 694678),
'scheduler/memory_enqueued': 1,
'start_time': datetime.datetime(2012, 7, 5, 13, 41, 2, 604025)}
2012-07-05 13:41:02+0000 [appid] INFO: Spider closed (finished)
2012-07-05 13:41:02+0000 [scrapy] INFO: Dumping global stats:
{'memusage/max': 95318016, 'memusage/startup': 95318016}