python - 302 と urllib2 で Cookie を失う

Question

アップロードを自動化するためにページへのログインをシミュレートしようとして、CookieJar / HTTPCookieProcessor で liburl2 を使用しています。

これについていくつかの質問と回答を見てきましたが、私の問題を解決するものは何もありません。302 リダイレクトで終了するログインをシミュレートすると、Cookie が失われます。302 応答は、サーバーによって Cookie が設定される場所ですが、urllib2 HTTPCookieProcessor はリダイレクト中に Cookie を保存していないようです。リダイレクトを無視する HTTPRedirectHandler クラスを作成しようとしましたが、うまくいかなかったようです。HTTPRedirectHandler からの Cookie を処理するために CookieJar をグローバルに参照しようとしましたが、1. これは機能しませんでした (リダイレクタからヘッダーを処理していたため、使用していた CookieJar 関数の extract_cookies には完全な要求が必要でした)。 2. それを処理するのは醜い方法です。

私はPythonにかなり慣れているので、おそらくこれに関するガイダンスが必要です。ここではほとんど正しいツリーをほえていると思いますが、間違ったブランチに焦点を合わせている可能性があります。

cj = cookielib.CookieJar()
cookieprocessor = urllib2.HTTPCookieProcessor(cj)


class MyHTTPRedirectHandler(urllib2.HTTPRedirectHandler):
  def http_error_302(self, req, fp, code, msg, headers):
    global cj
    cookie = headers.get("set-cookie")
    if cookie:
      # Doesn't work, but you get the idea
      cj.extract_cookies(headers, req)

    return urllib2.HTTPRedirectHandler.http_error_302(self, req, fp, code, msg, headers)

  http_error_301 = http_error_303 = http_error_307 = http_error_302

cookieprocessor = urllib2.HTTPCookieProcessor(cj)

# Oh yeah.  I'm using a proxy too, to follow traffic.
proxy = urllib2.ProxyHandler({'http': '127.0.0.1:8888'})
opener = urllib2.build_opener(MyHTTPRedirectHandler, cookieprocessor, proxy)

追加: mechanize も使用してみましたが、成功しませんでした。これはおそらく新しい質問ですが、同じ最終的な目標であるため、ここで提起します。

mechanize を使用するこの単純なコードは、302 を発行する URL (http://fxfeeds.mozilla.com/firefox/headlines.xml) で使用すると、set_handle_robots(False) を使用しない場合に同じ動作が発生することに注意してください。そうではないことを確認したかっただけです：

import urllib2, mechanize

browser = mechanize.Browser()
browser.set_handle_robots(False)
opener = mechanize.build_opener(*(browser.handlers))
r = opener.open("http://fxfeeds.mozilla.com/firefox/headlines.xml")

出力：

Traceback (most recent call last):
  File "redirecttester.py", line 6, in <module>
    r = opener.open("http://fxfeeds.mozilla.com/firefox/headlines.xml")
  File "build/bdist.macosx-10.6-universal/egg/mechanize/_opener.py", line 204, in open
  File "build/bdist.macosx-10.6-universal/egg/mechanize/_urllib2_fork.py", line 457, in http_response
  File "build/bdist.macosx-10.6-universal/egg/mechanize/_opener.py", line 221, in error
  File "build/bdist.macosx-10.6-universal/egg/mechanize/_urllib2_fork.py", line 332, in _call_chain
  File "build/bdist.macosx-10.6-universal/egg/mechanize/_urllib2_fork.py", line 571, in http_error_302
  File "build/bdist.macosx-10.6-universal/egg/mechanize/_opener.py", line 188, in open
  File "build/bdist.macosx-10.6-universal/egg/mechanize/_mechanize.py", line 71, in http_request
AttributeError: OpenerDirector instance has no attribute '_add_referer_header'

何か案は？

score 2 · Accepted Answer

私は最近まったく同じ問題を抱えていますが、時間の都合上、それを破棄して、と一緒に行くことにしましたmechanize。urllib2これは、ブラウザがリファラーヘッダー、リダイレクト、およびCookieに関して動作するのとまったく同じように動作するための完全な代替として使用できます。

import mechanize
cj = mechanize.CookieJar()
browser = mechanize.Browser()
browser.set_cookiejar(cj)
browser.set_proxies({'http': '127.0.0.1:8888'})

# Use browser's handlers to create a new opener
opener = mechanize.build_opener(*browser.handlers)

Browserオブジェクト自体をオープナーとして使用できます（メソッドを使用）.open()。内部的に状態を維持しますが、呼び出しごとに応答オブジェクトも返します。したがって、多くの柔軟性が得られます。

cookiejarまた、手動で検査したり、他のオブジェクトに渡したりする必要がない場合は、そのオブジェクトの明示的な作成と割り当てを省略できます。

私はこれが実際に起こっていることに対処しておらず、なぜurllib2このソリューションを箱から出して、または少なくとも多くの調整なしに提供できないのかを十分に認識していますが、時間が足りず、それを機能させたいだけの場合は、 mechanizeを使用するだけです。

score 1 · Accepted Answer

リダイレクトがどのように行われるかによって異なります。HTTPリフレッシュを介して実行される場合、mechanizeには使用可能なHTTPRefreshProcessorがあります。次のようなオープナーを作成してみてください。

cj = mechanize.CookieJar()
opener = mechanize.build_opener(
    mechanize.HTTPCookieProcessor(cj),
    mechanize.HTTPRefererProcessor,
    mechanize.HTTPEquivProcessor,
    mechanize.HTTPRefreshProcessor)

score 0 · Accepted Answer

少なくともhttp://www.fudzilla.com/home?format=feed&type=atomから Atom を読み取ろうとすると、以下のバリエーションが機能します。

以下のスニペットがそのまま実行されることを確認することはできませんが、開始点になる可能性があります。

import cookielib
cookie_jar = cookielib.LWPCookieJar()
cookie_handler = urllib2.HTTPCookieProcessor(cookie_jar)
handlers = [cookie_handler] #+others, we have proxy + progress handlers
opener = apply(urllib2.build_opener, tuple(handlers + [_FeedURLHandler()])) #see http://code.google.com/p/feedparser/source/browse/trunk/feedparser/feedparser.py#2848 for implementation of _FeedURLHandler
opener.addheaders = [] #may not be needed but see the comments around the link referred to below
try:
    return opener.open(request) #see http://code.google.com/p/feedparser/source/browse/trunk/feedparser/feedparser.py#2954 for implementation of request
finally:
    opener.close()

python - 302 と urllib2 で Cookie を失う

4 に答える 4

Related

Reference