python - Python応答デコード

Question

を使用する次の行の場合urllib：

# some request object exists
response = urllib.request.urlopen(request)
html = response.read().decode("utf8")

どの形式の文字列がread()返されますか？私はPythonのドキュメントからそれを理解しようと試みてきましたが、それについてはまったく触れていません。なぜあるのdecodeですか？decodeオブジェクトをutf-8またはutf-8にデコードしますか？どのフォーマットからどのフォーマットにデコードしますか？decodeドキュメントにもそれについては何も記載されていません。Pythonのドキュメントがひどいのでしょうか、それとも標準的な規則を理解していないのでしょうか。

そのHTMLをUTF-8ファイルに保存したいと思います。通常の書き込みを行うだけですか、それとも何かに「エンコード」して書き戻す必要がありますか？

注：urllibが非推奨になっていることは知っていますが、現在urllib2に切り替えることはできません。

score 1 · Accepted Answer

Python に尋ねる:

>>> r=urllib.urlopen("http://google.com")
>>> a=r.read()
>>> type(a)
0: <type 'str'>
>>> help(a.decode)
Help on built-in function decode:

decode(...)
    S.decode([encoding[,errors]]) -> object

    Decodes S using the codec registered for encoding. encoding defaults
    to the default encoding. errors may be given to set a different error
    handling scheme. Default is 'strict' meaning that encoding errors raise
    a UnicodeDecodeError. Other possible values are 'ignore' and 'replace'
    as well as any other name registered with codecs.register_error that is
    able to handle UnicodeDecodeErrors.

>>> b = a.decode('utf8')
>>> type(b)
1: <type 'unicode'>
>>>

したがって、はをread()返すようstrです。UTF-8からPython の内部 Unicode 形式に.decode()デコードします。

python - Python応答デコード

1 に答える 1

Related

Reference