python - Python の文字列連結がロシア語のテキストで機能するのに、string.format() が機能しないのはなぜですか?

Question

Windows-1251文字エンコードで保存されている CSV ファイルの行を解析 (およびエスケープ) しようとしています。この優れた回答を使用してこのエンコーディングに対処すると、何らかの理由でこれが機能するため、出力をテストするためにこの1行になりました。

print(row[0]+','+row[1])

出力:

Тяжелый Уборщик Обязанности,1 литр

この行は機能しませんが:

print("{0},{1}".format(*row))

このエラーの出力:

Name,Variant

Traceback (most recent call last):
  File "Russian.py", line 26, in <module>
    print("{0},{1}".format(*row))
UnicodeEncodeError: 'ascii' codec can't encode characters in position 2-3: ordinal not in range(128)

CSV の最初の 2 行を次に示します。

Name,Variant
Тяжелый Уборщик Обязанности,1 литр

参考までに、Russian.py の完全なソースを次に示します。

import csv
import cgi
from chardet.universaldetector import UniversalDetector
chardet_detector = UniversalDetector()

def charset_detect(f, chunk_size=4096):
    global chardet_detector
    chardet_detector.reset()
    while 1:
        chunk = f.read(chunk_size)
        if not chunk: break
        chardet_detector.feed(chunk)
        if chardet_detector.done: break
    chardet_detector.close()
    return chardet_detector.result

with open('Russian.csv') as csv_file:
    cd_result = charset_detect(csv_file)
    encoding = cd_result['encoding']
    csv_file.seek(0)
    csv_reader = csv.reader(csv_file)
    for bytes_row in csv_reader:
        row = [x.decode(encoding) for x in bytes_row]
        if len(row) >= 6:
            #print(row[0]+','+row[1])
            print("{0},{1}".format(*row))

score 3 · Accepted Answer

暗黙的str.format()に変換unicode()するwhichを使用しています。str()提供されたテンプレートに値を補間できるようにするために、そうする必要があります。

unicode.format()代わりに使用してください:

print(u"{0},{1}".format(*row))

uフォーマットリテラルの前に注意してください。結果の Unicode 出力に収まるように入力unicode.format()をデコードする必要があります。 str

一方、連結は暗黙的にデコードして、最終的なunicode()オブジェクト結果を生成できます。値に非 ASCII バイトが含まれていた','場合、暗黙のデコードも失敗します。

教訓: テキストを処理するときは、コード全体で Unicode 文字列リテラルを使用してください。

score 0 · Accepted Answer

the + operand works fine between a unicode string and an str string. On the other hand, str.format doesn't accept unicode strings as parameters.

Thus, you can simply replace the problematic line with the following:

print(u"{0},{1}".format(*row))

That should do the trick.

python - Python の文字列連結がロシア語のテキストで機能するのに、string.format() が機能しないのはなぜですか?

3 に答える 3

Related

Reference