python - Python でのユニコードの操作

Question

score 5 · Accepted Answer

You want to decode (not encode) to get a unicode string from a byte string.

>>> s = '\xd0\xbc\xd0\xb0\xd1\x80\xd0\xba\xd0\xb0'
>>> us = s.decode('utf-8')
>>> print us
марка

Note that you may not be able to print it because it contains characters outside ASCII. But you should be able to see its value in a Unicode-aware debugger. I ran the above in IDLE.

Update

It seems what you actually have is this:

>>> s = u'\xd0\xbc\xd0\xb0\xd1\x80\xd0\xba\xd0\xb0'

This is trickier because you first have to get those bytes into a bytestring before you call decode. I'm not sure what the "best" way to do that is, but this works:

>>> us = ''.join(chr(ord(c)) for c in s).decode('utf-8')
>>> print us
марка

Note that you should of course be decoding it before you store it in the database as a string.

score 4 · Accepted Answer

マークは正しいです。文字列をデコードする必要があります。バイト文字列はデコードすることで Unicode 文字列になり、エンコードは逆になります。これと他の多くの詳細はPragmatic Unicode または How Do I Stop The Pain? にあります。.

python - Python でのユニコードの操作

2 に答える 2

Related

Reference