python - サブプロセス出力のマルチバイト文字を 1 つずつ読み取る

Question

サブプロセスを使用してプロセスを実行しています：

    p = subprocess.Popen(cmd, stdout=subprocess.PIPE)

私がやりたいことは、ループ内で出力文字を 1 つずつ読み取ることです。

while something:
    char = p.stdout.read(1)

python3 では、 not を返しsubprocess.Popen().stdout.read()ます。str として使用したいので、次のようにする必要があります。bytes()str()

    char = char.decode("utf-8")

アスキー文字でも問題なく動作します。

しかし、ASCII 以外の文字 (ギリシャ文字など) を使用すると、UnicodeDecodeError が発生します。そのため、ギリシャ文字は複数のバイトで構成されています。問題は次のとおりです。

>>> b'\xce\xb5'.decode('utf-8')
'ε'
>>> b'\xce'.decode('utf-8') # b'\xce' is what subprocess...read(1) returns - one byte
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xce in position 0: unexpected end of data
>>>

どうすればこれに対処できますか？(文字列としての)の出力は、subprocess.Popen().stdout.read()"lorem ipsum εφδφδσloremipsum" のようなものになります。

一度に 1 文字を読み取りたいのですが、この文字は複数のバイトで構成されている可能性があります。

score 4 · Accepted Answer

ファイルオブジェクトをラップしてio.TextIOWrapper()、その場でパイプをデコードします。

import io

reader = io.TextIOWrapper(p.stdout, encoding='utf8')
while something:
    char = reader.read(1)

python - サブプロセス出力のマルチバイト文字を 1 つずつ読み取る

1 に答える 1

Related

Reference