python - Python print は、unicode サブクラスに repr、unicode、または str を使用していませんか?

Question

Python print はを使用していないか__repr__、印刷時にユニコードサブクラスを使用していません。私が間違っていることについての手がかりはありますか？__unicode____str__

これが私のコードです：

Python 2.5.2 の使用 (r252:60911、2009 年 10 月 13 日、14:11:59)

>>> class MyUni(unicode):
...     def __repr__(self):
...         return "__repr__"
...     def __unicode__(self):
...         return unicode("__unicode__")
...     def __str__(self):
...         return str("__str__")
...      
>>> s = MyUni("HI")
>>> s
'__repr__'
>>> print s
'HI'

これが上記の正確な近似値であるかどうかはわかりませんが、比較のために:

>>> class MyUni(object):
...     def __new__(cls, s):
...         return super(MyUni, cls).__new__(cls)
...     def __repr__(self):
...         return "__repr__"
...     def __unicode__(self):
...         return unicode("__unicode__")
...     def __str__(self):
...         return str("__str__")
...
>>> s = MyUni("HI")
>>> s
'__repr__'
>>> print s
'__str__'

[編集済み...] isinstance(instance, basestring) であり、Unicode の戻り値を制御できる文字列オブジェクトを取得する最良の方法のように思えます。Unicode repr を使用すると...

>>> class UserUnicode(str):
...     def __repr__(self):
...         return "u'%s'" % super(UserUnicode, self).__str__()
...     def __str__(self):
...         return super(UserUnicode, self).__str__()
...     def __unicode__(self):
...         return unicode(super(UserUnicode, self).__str__())
...
>>> s = UserUnicode("HI")
>>> s
u'HI'
>>> print s
'HI'
>>> len(s)
2

上記の_ str _と_ repr _はこの例に何も追加していませんが、アイデアはパターンを明示的に示し、必要に応じて拡張することです。

このパターンが制御を許可することを証明するためだけに:

>>> class UserUnicode(str):
...     def __repr__(self):
...         return "u'%s'" % "__repr__"
...     def __str__(self):
...         return "__str__"
...     def __unicode__(self):
...         return unicode("__unicode__")
... 
>>> s = UserUnicode("HI")
>>> s
u'__repr__'
>>> print s
'__str__'

考え？

score 10 · Accepted Answer

問題は、サブクラスを尊重しprintないことです。__str__unicode

からPyFile_WriteObject、によって使用print:

int
PyFile_WriteObject(PyObject *v, PyObject *f, int flags)
{
...
        if ((flags & Py_PRINT_RAW) &&
    PyUnicode_Check(v) && enc != Py_None) {
    char *cenc = PyString_AS_STRING(enc);
    char *errors = fobj->f_errors == Py_None ? 
      "strict" : PyString_AS_STRING(fobj->f_errors);
    value = PyUnicode_AsEncodedString(v, cenc, errors);
    if (value == NULL)
        return -1;

PyUnicode_Check(v)vの型がunicode またはサブクラスである場合に true を返します。したがって、このコードはを参照せずに Unicode オブジェクトを直接書き込みます__str__。

サブクラス化strとオーバーライド__str__が期待どおりに機能することに注意してください。

>>> class mystr(str):
...     def __str__(self): return "str"
...     def __repr__(self): return "repr"
... 
>>> print mystr()
str

strorをunicode明示的に呼び出すのと同じように:

>>> class myuni(unicode):
...     def __str__(self): return "str"
...     def __repr__(self): return "repr"
...     def __unicode__(self): return "unicode"
... 
>>> print myuni()

>>> str(myuni())
'str'
>>> unicode(myuni())
u'unicode'

これは、現在実装されている Python のバグと解釈できると思います。

score 6 · Accepted Answer

をサブクラス化してunicodeいます。

すでにユニコード__unicode__であるため、呼び出されることはありません。ここでは代わりに、オブジェクトがエンコーディングにエンコードされます。stdout

>>> s.encode('utf8')
'HI'

.encode()ただし、メソッドの代わりに直接 C 呼び出しを使用します。printこれは、 Unicode オブジェクトのデフォルトの動作です。

printステートメントはを呼び出します。これは、オブジェクトを処理するときPyFile_WriteObjectに呼び出します。後者は現在のエンコーディングのエンコーディング関数に従い、これらはUnicode C マクロを使用してデータ構造に直接アクセスします。これを Python からインターセプトすることはできません。PyUnicode_AsEncodedStringunicode

あなたが探しているのは__encode__フックだと思います。これはすでにunicodeサブクラスであるprintため、エンコードするだけで、unicode againに変換する必要はなく、明示的にエンコードせずに文字列に変換することもできません。__encode__意味があるかどうかを確認するには、これを Python コア開発者に取り上げる必要があります。

python - Python print は、unicode サブクラスに __repr__、__unicode__、または __str__ を使用していませんか?

2 に答える 2

Related

Reference

python - Python print は、unicode サブクラスに repr、unicode、または str を使用していませんか?