python - Python で cp1252.py の代わりに utf-8.py をエンコードする方法

Question

行に特定の文字列が含まれている場合、あるファイルのすべての行を別のファイルにコピーする非常に小さなプログラムを作成しました。完全なソースは次のとおりです。

f_in = open("all.txt", "r")
f_out = open("all.out", "w")

for line in f_in:
    if "<title>" in line:
        f_out.write(line)

f_out.close()
f_in.close()

all.txt の utf-8 文字になるまで、これは非常にうまく機能します。次に、次のように言って失敗します：

UnicodeDecodeError: 'charmap' コーデックは位置 7102 のバイト 0x9d をデコードできません: <undefined> への文字マップ

ここで、悪い回避策を実行しました。ディレクトリ \Python\Lib\encodings で、utf-8.py をコピーして名前を cp1252.py に変更しました。

これからは、上記の小さなプログラムは問題なく実行されます。しかし、もっと洗練された解決策が必要です。Phyton で cp1252.py の代わりに utf-8.py を使用するには何が必要か教えていただけますか?

これは、重い変換やデコードなどを行わなくても可能であると確信しています.cp1252.pyの代わりに別のデコードを使用するようにPythonに指示するだけです.

score 4 · Accepted Answer

io.open()代わりに、Unicode 値の読み取りと書き込みに使用します。

import io

with io.open('all.txt', 'r', encoding='utf8') as f_in:
    with io.open('all.out', 'w', encoding='utf8') as f_out:
        for line in f_in:
            if u"<title>" in line:
                f_out.write(line)

コーデックファイルの名前変更は、最後に行うべきことです。

python - Python で cp1252.py の代わりに utf-8.py をエンコードする方法

1 に答える 1

Related

Reference