python - Python の Unicode と `decode()`

Question

>>> a = "我"  # chinese  
>>> b = unicode(a,"gb2312")  
>>> a.__class__   
<type 'str'>   
>>> b.__class__   
<type 'unicode'>  # b is unicode
>>> a
'\xce\xd2'
>>> b
u'\u6211' 

>>> c = u"我"
>>> c.__class__
<type 'unicode'>  # c is unicode
>>> c
u'\xce\xd2'

bとcはすべてユニコードですが、>>> boutputs u'\u6211'、および>>> coutputs u'\xce\xd2'、なぜですか?

score 12 · Accepted Answer

12

于 2012-04-23T09:05:06.420 に答える

score 0 · Accepted Answer

オブジェクトの名前を入力するだけで、インタラクティブな Python ショー表現が表示されます。一方、print コマンドは文字をレンダリングしようとします。aという名前の変数は文字列型です。実際、Python 2.x の文字列は一連のバイトです。したがって、作業環境によって異なります。unicode() 関数に対して、gb2312 エンコーディングを使用するように指示します。true の場合、bには指定されたエンコーディングでの文字の正しい表現が含まれます。

してみてください

>>> print b

あなたの場合。希望する結果が表示される可能性があります。また試してください：

>>> print repr(a)
...
>>> print repr(b)

表現は (可能であれば) テキスト文字列で、ソースコードにコピーアンドペーストすると同じ値のオブジェクトが作成されます。

Mark Pilgrim の "Dive Into Python 3" の Chapter 4. Strings ( http://getpython3.com/diveintopython3/strings.html ) を参照してください。

python - Python の Unicode と `decode()`

2 に答える 2

Related

Reference