python - Python UTF-16出力とWindowsの行末にバグがありますか？

Question

このコードで：

test.py

import sys
import codecs

sys.stdout = codecs.getwriter('utf-16')(sys.stdout)

print "test1"
print "test2"

次に、次のように実行します。

test.py > test.txt

Windows2000上のPython2.6では、改行文字がバイトシーケンスとして出力されていることがわかりましたが、\x0D\x0A\x00これはもちろんUTF-16では間違っています。

私は何かが足りないのですか、それともこれはバグですか？

score 3 · Accepted Answer

改行の変換はstdoutファイル内で行われています。sys.stdout（StreamWriter）に「test1\n」を書き込んでいます。StreamWriterは、これを「t \ x00e \ x00s \ x00t \ x001 \ x00 \ n \ x00」に変換し、実際のファイルである元のsys.stderrに送信します。

そのファイルは、データをUTF-16に変換したことを認識していません。わかっているのは、出力ストリームの\n値を\x0D \ x0Aに変換する必要があるということだけです。これにより、表示されている出力が得られます。

score 3 · Accepted Answer

これを試して：

import sys
import codecs

if sys.platform == "win32":
    import os, msvcrt
    msvcrt.setmode(sys.stdout.fileno(), os.O_BINARY)

class CRLFWrapper(object):
    def __init__(self, output):
        self.output = output

    def write(self, s):
        self.output.write(s.replace("\n", "\r\n"))

    def __getattr__(self, key):
        return getattr(self.output, key)

sys.stdout = CRLFWrapper(codecs.getwriter('utf-16')(sys.stdout))
print "test1"
print "test2"

score 0 · Accepted Answer

これまでに2つの解決策を見つけましたが、 Windowsスタイルの行末を持つUTF-16の出力を提供する解決策はありません。

まず、PythonprintステートメントをUTF-16エンコーディング（出力Unixスタイルの行末）を持つファイルにリダイレクトするには：

import sys
import codecs

sys.stdout = codecs.open("outputfile.txt", "w", encoding="utf16")

print "test1"
print "test2"

次に、stdoutUTF-16エンコーディングを使用して、行末の変換を破損せずにリダイレクトするには（Unixスタイルの行末を出力します）（このActiveStateレシピのおかげで）：

import sys
import codecs

sys.stdout = codecs.getwriter('utf-16')(sys.stdout)

if sys.platform == "win32":
    import os, msvcrt
    msvcrt.setmode(sys.stdout.fileno(), os.O_BINARY)

print "test1"
print "test2"

python - Python UTF-16出力とWindowsの行末にバグがありますか？

3 に答える 3

Related

Reference