python - Python Scrapy allowed_damins 属性

翻译自：https://stackoverflow.com/questions/26204198 2014-10-05T15:36:45.797

39 次

スタックオーバーフローの質問の投稿情報を取得して、簡単なコードをコーディングして勉強しています。

allowed_domains = ["http://stackoverflow.com/questions/]スパイダーをベースにセットしました。そして、その parse() メソッドは、フォーマットの URL を持つリクエストのみを返します。"http://stackoverflow.com/questions/%d/" % no

私はそれがうまくいくと思った...多分許可されたドメインについて誤解している. parse() によって返されるすべてのリクエストは、allowed_domain によってフィルタリングされているようです。allowed_domain を削除した場合にのみ機能します。説明できますか..？私の些細な質問で申し訳ありません。

class StackOverFlowPost(scrapy.Spider):
    startNo = 26200877
    endNo = 26200880
    curNo = 26200877
    name = "stackOverFlowPost"
    start_urls = ["http://stackoverflow.com/questions/%d/" % startNo ]
    allowed_domains = ["http://stackoverflow.com/questions"]
    baseUrl = "http://stackoverflow.com/questions/%d/"

    def parse(self, response):
        itemObj = items.StackOverFlowItem()

        # getting items information from the page
        ...
        yield itemObj

        StackOverFlowPost.curNo += 1
        nextPost = StackOverFlowPost.baseUrl % StackOverFlowPost.curNo  

        yield scrapy.Request(nextPost, callback = self.parse)

python - Python Scrapy allowed_damins 属性

1 に答える 1

Related

Reference