python - Python の urllib(2) がリダイレクトをたどらないようにするにはどうすればよいですか

Question

現在、Python を使用してサイトにログインしようとしていますが、サイトは同じページで Cookie とリダイレクトステートメントを送信しているようです。Python はそのリダイレクトに従っているように見えるため、ログインページから送信された Cookie を読み取ることができません。Python の urllib (または urllib2) urlopen がリダイレクトをたどらないようにするにはどうすればよいですか?

score 33 · Accepted Answer

あなたはいくつかのことをすることができます：

各リダイレクトをインターセプトする独自の HTTPRedirectHandler を構築する
HTTPCookieProcessor のインスタンスを作成し、そのオープナーをインストールして、cookiejar にアクセスできるようにします。

これは両方を示す簡単な小さなことです

import urllib2

#redirect_handler = urllib2.HTTPRedirectHandler()

class MyHTTPRedirectHandler(urllib2.HTTPRedirectHandler):
    def http_error_302(self, req, fp, code, msg, headers):
        print "Cookie Manip Right Here"
        return urllib2.HTTPRedirectHandler.http_error_302(self, req, fp, code, msg, headers)

    http_error_301 = http_error_303 = http_error_307 = http_error_302

cookieprocessor = urllib2.HTTPCookieProcessor()

opener = urllib2.build_opener(MyHTTPRedirectHandler, cookieprocessor)
urllib2.install_opener(opener)

response =urllib2.urlopen("WHEREEVER")
print response.read()

print cookieprocessor.cookiejar

score 29 · Accepted Answer

リダイレクトを停止するだけでよい場合は、簡単な方法があります。たとえば、Cookieのみを取得したいのですが、パフォーマンスを向上させるために、他のページにリダイレクトしたくありません。また、コードが3xxのままであることを願っています。たとえば、302を使用しましょう。

class MyHTTPErrorProcessor(urllib2.HTTPErrorProcessor):

    def http_response(self, request, response):
        code, msg, hdrs = response.code, response.msg, response.info()

        # only add this line to stop 302 redirection.
        if code == 302: return response

        if not (200 <= code < 300):
            response = self.parent.error(
                'http', request, response, code, msg, hdrs)
        return response

    https_response = http_response

cj = cookielib.CookieJar()
opener = urllib2.build_opener(urllib2.HTTPCookieProcessor(cj), MyHTTPErrorProcessor)

このように、urllib2.HTTPRedirectHandler.http_error_302（）に移動する必要はありません。

さらに一般的なケースは、（必要に応じて）リダイレクトを停止したいだけの場合です。

class NoRedirection(urllib2.HTTPErrorProcessor):

    def http_response(self, request, response):
        return response

    https_response = http_response

そして、通常は次のように使用します。

cj = cookielib.CookieJar()
opener = urllib2.build_opener(NoRedirection, urllib2.HTTPCookieProcessor(cj))
data = {}
response = opener.open('http://www.example.com', urllib.urlencode(data))
if response.code == 302:
    redirection_target = response.headers['Location']

score 12 · Accepted Answer

urllib2.urlopenbuild_opener()このハンドラークラスのリストを使用する呼び出し:

handlers = [ProxyHandler, UnknownHandler, HTTPHandler,
HTTPDefaultErrorHandler, HTTPRedirectHandler,
FTPHandler, FileHandler, HTTPErrorProcessor]

urllib2.build_opener(handlers)を省略したリストで自分自身を呼び出してから、結果に対してメソッドをHTTPRedirectHandler呼び出しopen()て URL を開くことができます。リダイレクトが本当に嫌いな場合はurllib2.install_opener(opener)、独自の非リダイレクトオープナーを呼び出すこともできます。

あなたの本当の問題は、urllib2あなたが望むようにクッキーをしていないことのようです. Python を使用して Web ページにログインし、後で使用するために Cookie を取得する方法も参照してください。

score 3 · Accepted Answer

この質問はここで以前に尋ねられました。

編集:風変わりな Web アプリケーションを扱う必要がある場合は、おそらく mechanize を試してください。これは、Web ブラウザーをシミュレートする優れたライブラリーです。リダイレクト、Cookie、ページの更新を制御できます... Web サイトが JavaScript に [大きく] 依存していない場合は、機械化とうまくやっていくことができます。

python - Python の urllib(2) がリダイレクトをたどらないようにするにはどうすればよいですか

4 に答える 4

Related

Reference