python - Scrapy ジョブの再開時にスクレイピングされた Web サイトに再ログインする

Question

以前に一時停止したスクレイピングジョブを再開するときに、 Scrapyスパイダーを Web サイトにログインさせる方法はありますか?

編集: 明確にするために、私の質問は、一般的な Cookie ではなく、Scrapy スパイダーに関するものです。おそらくより良い質問は、Scrapy スパイダーがジョブディレクトリで凍結された後に復活したときに呼び出されるメソッドがあるかどうかです。

score -1 · Accepted Answer

はい、できます。

スクレーパーの正確なワークフローをより明確にする必要があります。

とにかく、初めてスクレイピングするときにログインし、スクレイピングを再開する間、同じ Cookie を使用したいと思います。

httplib2ライブラリを使用して、このようなことを行うことができます。これはサンプルページのコードサンプルです。より明確にするためにコメントを追加しました。

import urllib
import httplib2

http = httplib2.Http()

url = 'http://www.example.com/login'   
body = {'USERNAME': 'foo', 'PASSWORD': 'bar'}
headers = {'Content-type': 'application/x-www-form-urlencoded'}

//submitting form data for logging into the website
response, content = http.request(url, 'POST', headers=headers, body=urllib.urlencode(body))

//Now the 'response' object contains the cookie the website sends
//which can be used for visiting the website again

//setting the cookie for the new 'headers'
headers_2 = {'Cookie': response['set-cookie']}

url = 'http://www.example.com/home'   

// using the 'headers_2' object to visit the website,
response, content = http.request(url, 'GET', headers=headers_2)

Cookie の仕組みがよくわからない場合は、検索してください。簡単に言えば、「Cookie」は、サーバーがセッションを維持するのに役立つクライアント側のテクノロジーです。

python - Scrapy ジョブの再開時にスクレイピングされた Web サイトに再ログインする

1 に答える 1

Related

Reference