python - Unicodeキーまたは値を含むdictのリストのようなオブジェクトをutf8にエンコードする方法は?

Question

Unicode 文字列を含むオブジェクトを utf8 に変換する簡単な方法はありますか?

例えば：

before = [ 
    u'labelset': {u'labelset_id': 80L, u'labelset_name': u'\u6d17\u8863\u6a5f'},
    u'labelset': {u'labelset_id': 81L, u'labelset_name': u'\u6d17\u8863\u6a5f'},
    u'labelset': {u'labelset_id': 82L, u'labelset_name': u'\u6d17\u8863\u6a5f'},
]

after = [
    'labelset': {labelset_id: 80L, labelset_name: 'test'},
    'labelset': {labelset_id: 81L, labelset_name: 'test'},
    'labelset': {labelset_id: 81L, labelset_name: 'test'},
]

score 1 · Accepted Answer

Python 2.* には、2 種類の文字列があります。

str (sequence of bytes)
unicode (sequence of unicode code points)

Unicode を str に変換するには、ルールを指定する必要があります (どのバイトが特定の Unicode ポイントを表すか)。このルールはエンコーディングと呼ばれます。したがって、エンコーディングを使用して unicode を str に変換するには、次のメソッドutf8を使用する必要があります。encode

>>> u'\u6d17\u8863\u6a5f'.encode('utf8')
'\xe6\xb4\x97\xe8\xa1\xa3\xe6\xa9\x9f'

結果は一連のバイトになり、たとえばテキストファイルに保存できます。

str を unicode に戻すには、unicode から str への変換中に適用されたルールを知る必要があります。現在のケースでは、このルールはutf8エンコーディングでした。この目的のために、次のdecode方法を使用します。

>>> '\xe6\xb4\x97\xe8\xa1\xa3\xe6\xa9\x9f'.decode('utf8')
u'\u6d17\u8863\u6a5f'

これは、 python 文字列とエンコーディングに関する優れたプレゼンテーションです。

python - Unicodeキーまたは値を含むdictのリストのようなオブジェクトをutf8にエンコードする方法は?

1 に答える 1

Related

Reference