python - Scrapy: 接続が拒否されました

Question

Scrapy のインストールをテストしようとすると、次のエラーが表示されます。

$ scrapy shell http://www.google.es
j2011-02-16 10:54:46+0100 [scrapy] INFO: Scrapy 0.12.0.2536 started (bot: scrapybot)
2011-02-16 10:54:46+0100 [scrapy] DEBUG: Enabled extensions: TelnetConsole, SpiderContext, WebService, CoreStats, MemoryUsage, CloseSpider
2011-02-16 10:54:46+0100 [scrapy] DEBUG: Enabled scheduler middlewares: DuplicatesFilterMiddleware
2011-02-16 10:54:46+0100 [scrapy] DEBUG: Enabled downloader middlewares: HttpAuthMiddleware, DownloadTimeoutMiddleware, UserAgentMiddleware, RetryMiddleware, DefaultHeadersMiddleware, RedirectMiddleware, CookiesMiddleware, HttpProxyMiddleware, HttpCompressionMiddleware, DownloaderStats
2011-02-16 10:54:46+0100 [scrapy] DEBUG: Enabled spider middlewares: HttpErrorMiddleware, OffsiteMiddleware, RefererMiddleware, UrlLengthMiddleware, DepthMiddleware
2011-02-16 10:54:46+0100 [scrapy] DEBUG: Enabled item pipelines: 
2011-02-16 10:54:46+0100 [scrapy] DEBUG: Telnet console listening on 0.0.0.0:6023
2011-02-16 10:54:46+0100 [scrapy] DEBUG: Web service listening on 0.0.0.0:6080
2011-02-16 10:54:46+0100 [default] INFO: Spider opened
2011-02-16 10:54:47+0100 [default] DEBUG: Retrying <GET http://www.google.es> (failed 1 times): Connection was refused by other side: 111: Connection refused.
2011-02-16 10:54:47+0100 [default] DEBUG: Retrying <GET http://www.google.es> (failed 2 times): Connection was refused by other side: 111: Connection refused.
2011-02-16 10:54:47+0100 [default] DEBUG: Discarding <GET http://www.google.es> (failed 3 times): Connection was refused by other side: 111: Connection refused.
2011-02-16 10:54:47+0100 [default] ERROR: Error downloading <http://www.google.es>: [Failure instance: Traceback (failure with no frames): <class 'twisted.internet.error.ConnectionRefusedError'>: Connection was refused by other side: 111: Connection refused.
    ]
2011-02-16 10:54:47+0100 [scrapy] ERROR: Shell error
    Traceback (most recent call last):
    Failure: scrapy.exceptions.IgnoreRequest: Connection was refused by other side: 111: Connection refused.

2011-02-16 10:54:47+0100 [default] INFO: Closing spider (shutdown)
2011-02-16 10:54:47+0100 [default] INFO: Spider closed (shutdown)

バージョン:

スクレイピー 0.12.0.2536
パイソン 2.6.6
OS: Ubuntu 10.10

編集:ブラウザ、wget、telnet google.es 80でアクセスでき、すべてのサイトで発生します。

score 10 · Accepted Answer

ミッション 1: Scrapy は「ボット」を含むユーザーエージェントを送信します。ユーザーエージェントに基づいてサイトがブロックされる場合もあります。

settings.py で USER_AGENT をオーバーライドしてみてください

例えば：USER_AGENT = 'Mozilla/5.0 (X11; Linux x86_64; rv:7.0.1) Gecko/20100101 Firefox/7.7'

ミッション 2: リクエスト間に遅延を与えて、人間がリクエストを送信していることを偽装してみてください。

DOWNLOAD_DELAY = 0.25

ミッション 3: 何も機能しない場合は、wireshark をインストールして、scrapy 送信時とブラウザー送信時の要求ヘッダー (または) 投稿データの違いを確認します。

score 1 · Accepted Answer

おそらく、ネットワーク接続に問題があります。

まず、インターネット接続を確認してください。

プロキシサーバー経由でネットにアクセスする場合は、スクレイピープロジェクトにコードを追加する必要があります ( http://doc.scrapy.org/en/latest/topics/downloader-middleware.html#scrapy.contrib.downloadermiddleware.httpproxy.HttpProxyMiddleware )

とにかく、スクレイピーのバージョンをアップグレードしてみてください。

score 0 · Accepted Answer

私もそのエラーを受け取りました。アクセスしていたポートがファイアウォールによってブロックされたことが原因であることが判明しました。ホワイトリストに登録されていない限り、私のサーバーはデフォルトでポートをブロックしました。

python - Scrapy: 接続が拒否されました

3 に答える 3

Related

Reference