python-2.7 - raw_inputting ユニコード文字列

Question

「python 2.7のユニコードの使い方」を何度も読み、このフォーラムを徹底的に閲覧しましたが、見つけて試したものは何もプログラムを機能させませんでした。

これは、dictionary.com のエントリを一連の例文と単語と発音のペアに変換することになっています。しかし、最初は失敗します。IPA (つまり Unicode) 文字は、入力直後に意味不明な文字に変換されます。

# -*- coding: utf-8 -*-

""" HERE'S HOW A TYPICAL DICTIONARY.COM ENTRY LOOKS LIKE
white·wash
/ˈʰwaɪtˌwɒʃ, -ˌwɔʃ, ˈwaɪt-/ Show Spelled
noun
1.
a composition, as of lime and water or of whiting, size, and water, used for whitening walls, woodwork, etc.
2.
anything, as deceptive words or actions, used to cover up or gloss over faults, errors, or wrongdoings, or absolve a wrongdoer from blame.
3.
Sports Informal. a defeat in which the loser fails to score.
verb (used with object)
4.
to whiten with whitewash.
5.
to cover up or gloss over the faults or errors of; absolve from blame.
6.
Sports Informal. to defeat by keeping the opponent from scoring: The home team whitewashed the visitors eight to nothing.
"""

def wdefinp():   #word definition input
    wdef=u''
    emptylines=0 
    print '\nREADY\n\n'
    while True:
        cinp=raw_input()   #current input line
        if cinp=='':
            emptylines += 1
            if emptylines >= 3:   #breaking out by 3xEnter
                wdef=wdef[:-2]
                return wdef
        else:
            emptylines = 0
        wdef=wdef + '\n' + cinp
    return wdef

wdef=wdefinp()
print wdef.decode('utf-8')

この結果: whiteÂ·wash /Ë�Ę°waÉŞtËŚwÉ'Ę�, -ËŚwÉ"Ę�, Ë�waÉŞt-/ Show Spelled ...

どんな助けでも大歓迎です。

score 0 · Accepted Answer

わかりました、私はあなたのプログラムでいくつかの障害を再現することができました

まず、ターミナルで実行し、例のテキストを貼り付けた場合、この行でエラーが発生します (申し訳ありませんが、私の行番号はあなたのものと一致しません):

  File "unicod.py", line 22, in wdefinp
    wdef=wdef + '\n' + cinp
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc2 in position 5: ordinal not in range(128)

これを修正するために、このスタックオーバーフローの質問からの回答を使用しました: How to read Unicode input and compare Unicode strings in Python?

固定回線は

cinp = raw_input().decode(sys.stdin.encoding)

基本的に入力エンコーディングを知る必要があり、utf8 への変換が可能です。

それが修正されたら、次の問題は同様の問題です

File "unicod.py", line 28, in <module>
    print wdef.decode('utf-8')
  File "/usr/lib/python2.7/encodings/utf_8.py", line 16, in decode
    return codecs.utf_8_decode(input, errors, True)
UnicodeEncodeError: 'ascii' codec can't encode character u'\xb7' in position 6: ordinal not in range(128)

関数から返されるデータは既に utf8 の「二重デコード」であるため、機能しません。「」を削除するだけ.decode('utf8')で問題なく動作します

python-2.7 - raw_inputting ユニコード文字列

1 に答える 1

Related

Reference