2 に答える
You want to decode
(not encode
) to get a unicode string from a byte string.
>>> s = '\xd0\xbc\xd0\xb0\xd1\x80\xd0\xba\xd0\xb0'
>>> us = s.decode('utf-8')
>>> print us
марка
Note that you may not be able to print
it because it contains characters outside ASCII. But you should be able to see its value in a Unicode-aware debugger. I ran the above in IDLE.
Update
It seems what you actually have is this:
>>> s = u'\xd0\xbc\xd0\xb0\xd1\x80\xd0\xba\xd0\xb0'
This is trickier because you first have to get those bytes into a bytestring before you call decode
. I'm not sure what the "best" way to do that is, but this works:
>>> us = ''.join(chr(ord(c)) for c in s).decode('utf-8')
>>> print us
марка
Note that you should of course be decoding it before you store it in the database as a string.
マークは正しいです。文字列をデコードする必要があります。バイト文字列はデコードすることで Unicode 文字列になり、エンコードは逆になります。これと他の多くの詳細はPragmatic Unicode または How Do I Stop The Pain? にあります。.