python - python-requests: すべてを消費せずに応答コンテンツの先頭をフェッチする

Question

python-requests と python-magic を使用して、すべてのコンテンツを取得せずに Web リソースの MIME タイプをテストしたいと思います (特に、このリソースがたまたま ogg ファイルや PDF ファイルである場合)。結果に基づいて、すべてをフェッチすることにするかもしれません。ただし、MIME タイプをテストした後で text メソッドを呼び出すと、まだ消費されていないものしか返されません。応答コンテンツを消費せずに MIME タイプをテストするにはどうすればよいですか?

以下は私の現在のコードです。

import requests
import magic


r = requests.get("http://www.december.com/html/demo/hello.html", prefetch=False)
mime = magic.from_buffer(r.iter_content(256).next(), mime=True)

if mime == "text/html":
    print(r.text)  # I'd like r.text to give me the entire response content

ありがとう！

score 9 · Accepted Answer

'content-type' で十分な場合は、'Get' の代わりに HTTP 'Head' リクエストを発行して、HTTP ヘッダーを受信するだけです。

import requests

url = 'http://www.december.com/html/demo/hello.html'
response = requests.head(url)
print response.headers['content-type']

score 4 · Accepted Answer

注: この質問がされた時点で、本文が使用するヘッダーストリームのみを取得する正しい方法が使用されていprefetch=Falseました。そのオプションはその後に名前が変更されstream、ブール値が反転されるため、必要ですstream=True。

元の答えは次のとおりです。

一度使用するiter_content()と、引き続き使用する必要があります。.textボンネットの下で間接的に同じインターフェイスを使用します (経由.content)。

言い換えれば、iter_content()at allを使用することで.text、手動で行う作業を行う必要があります。

from requests.compat import chardet

r = requests.get("http://www.december.com/html/demo/hello.html", prefetch=False)
peek = r.iter_content(256).next()
mime = magic.from_buffer(peek, mime=True)

if mime == "text/html":
    contents = peek + b''.join(r.iter_content(10 * 1024))
    encoding = r.encoding
    if encoding is None:
        # detect encoding
        encoding = chardet.detect(contents)['encoding']
    try:
        textcontent = str(contents, encoding, errors='replace')
    except (LookupError, TypeError):
        textcontent = str(contents, errors='replace')
    print(textcontent)

Python 3 を使用していると仮定します。

別の方法として、次の 2 つのリクエストを行います。

r = requests.get("http://www.december.com/html/demo/hello.html", prefetch=False)
mime = magic.from_buffer(r.iter_content(256).next(), mime=True)

if mime == "text/html":
     print(r.requests.get("http://www.december.com/html/demo/hello.html").text)

Python 2 バージョン:

r = requests.get("http://www.december.com/html/demo/hello.html", prefetch=False)
peek = r.iter_content(256).next()
mime = magic.from_buffer(peek, mime=True)

if mime == "text/html":
    contents = peek + ''.join(r.iter_content(10 * 1024))
    encoding = r.encoding
    if encoding is None:
        # detect encoding
        encoding = chardet.detect(contents)['encoding']
    try:
        textcontent = unicode(contents, encoding, errors='replace')
    except (LookupError, TypeError):
        textcontent = unicode(contents, errors='replace')
    print(textcontent)

python - python-requests: すべてを消費せずに応答コンテンツの先頭をフェッチする

2 に答える 2

Related

Reference