python - UnicodeEncodeError：'ascii'コーデックは文字をエンコードできません

Question

URL応答をフィードするdictがあります。好き：

>>> d
{
0: {'data': u'<p>found "\u62c9\u67cf \u591a\u516c \u56ed"</p>'}
1: {'data': u'<p>some other data</p>'}
...
}

xml.etree.ElementTreeこのデータ値で関数を使用している間（ d[0]['data']）、最も有名なエラーメッセージが表示されます。

UnicodeEncodeError: 'ascii' codec can't encode characters...

ElementTreeパーサーに適したものにするために、このUnicode文字列をどうすればよいですか？

PS。UnicodeとPythonの説明付きのリンクを送らないでください。残念ながら、私はすでにそれをすべて読んでいて、うまくいけば他の人ができるように、それを利用することはできません。

score 25 · Accepted Answer

手動でUTF-8にエンコードする必要があります。

ElementTree.fromstring(d[0]['data'].encode('utf-8'))

APIはエンコードされたバイトのみを入力として受け取るためです。UTF-8は、このようなデータのデフォルトとして適しています。

そこから再びユニコードにデコードできるようになります。

>>> from xml.etree import ElementTree
>>> p = ElementTree.fromstring(u'<p>found "\u62c9\u67cf \u591a\u516c \u56ed"</p>'.encode('utf8'))
>>> p.text
u'found "\u62c9\u67cf \u591a\u516c \u56ed"'
>>> print p.text
found "拉柏 多公 园"

python - UnicodeEncodeError：'ascii'コーデックは文字をエンコードできません

1 に答える 1

Related

Reference