このサイトの CrawlSpider を作成しようとしています: http://www.shams-stores.com/shop/index.php これは私のコードです:
import urlparse
from scrapy.contrib.spiders import CrawlSpider, Rule
from scrapy.contrib.linkextractors.sgml import SgmlLinkExtractor
from scrapy.selector import HtmlXPathSelector
from project.items import Product
import re
class ShamsStoresSpider(CrawlSpider):
name = "shamsstores2"
domain_name = "shams-stores.com"
CONCURRENT_REQUESTS = 1
start_urls = ["http://www.shams-stores.com/shop/index.php"]
rules = (
#categories
Rule(SgmlLinkExtractor(restrict_xpaths=('//div[@id="categories_block_left"]/div/ul/li/a'), unique=False), callback='process', follow=True),
)
def process(self,response):
print response
これは、スクレイピー クロール shamsstores2 を使用したときに得られる応答です。
2013-11-05 22:56:36+0200 [scrapy] DEBUG: Web service listening on 0.0.0.0:6081
2013-11-05 22:56:41+0200 [shamsstores2] DEBUG: Crawled (200) <GET http://www.shams-stores.com/shop/index.php> (referer: None)
2013-11-05 22:56:42+0200 [shamsstores2] DEBUG: Redirecting (301) to <GET http://www.shams-stores.com/shop/index.php?id_category=14&controller=category&id_lang=1> from <GET http://www.shams-stores.com/shop/index.php?controller=category&id_category=14&id_lang=1>
2013-11-05 22:56:42+0200 [shamsstores2] DEBUG: Filtered duplicate request: <GET http://www.shams-stores.com/shop/index.php?id_category=14&controller=category&id_lang=1> - no more duplicates will be shown (see DUPEFILTER_CLASS)
2013-11-05 22:56:43+0200 [shamsstores2] DEBUG: Redirecting (301) to <GET http://www.shams-stores.com/shop/index.php?id_category=13&controller=category&id_lang=1> from <GET http://www.shams-stores.com/shop/index.php?controller=category&id_category=13&id_lang=1>
2013-11-05 22:56:43+0200 [shamsstores2] DEBUG: Redirecting (301) to <GET http://www.shams-stores.com/shop/index.php?id_category=12&controller=category&id_lang=1> from <GET http://www.shams-stores.com/shop/index.php?controller=category&id_category=12&id_lang=1>
2013-11-05 22:56:43+0200 [shamsstores2] DEBUG: Redirecting (301) to <GET http://www.shams-stores.com/shop/index.php?id_category=10&controller=category&id_lang=1> from <GET http://www.shams-stores.com/shop/index.php?controller=category&id_category=10&id_lang=1>
2013-11-05 22:56:43+0200 [shamsstores2] DEBUG: Redirecting (301) to <GET http://www.shams-stores.com/shop/index.php?id_category=9&controller=category&id_lang=1> from <GET http://www.shams-stores.com/shop/index.php?controller=category&id_category=9&id_lang=1>
2013-11-05 22:56:44+0200 [shamsstores2] DEBUG: Redirecting (301) to <GET http://www.shams-stores.com/shop/index.php?id_category=8&controller=category&id_lang=1> from <GET http://www.shams-stores.com/shop/index.php?controller=category&id_category=8&id_lang=1>
2013-11-05 22:56:44+0200 [shamsstores2] DEBUG: Redirecting (301) to <GET http://www.shams-stores.com/shop/index.php?id_category=7&controller=category&id_lang=1> from <GET http://www.shams-stores.com/shop/index.php?controller=category&id_category=7&id_lang=1>
2013-11-05 22:56:44+0200 [shamsstores2] DEBUG: Redirecting (301) to <GET http://www.shams-stores.com/shop/index.php?id_category=6&controller=category&id_lang=1> from <GET http://www.shams-stores.com/shop/index.php?controller=category&id_category=6&id_lang=1>
2013-11-05 22:56:44+0200 [shamsstores2] INFO: Closing spider (finished)
ルールから抽出されたリンクにヒットし、それらのリンクが他のリンクにリダイレクトされ、関数: プロセスを実行せずに停止します。ベース スパイダーを使用してこれを修正できますが、修正してまだクロールスパイダーを使用することはできますか?