python - jsonpickle/json function input utf-8, output unicode?

Question

Wrote the following two functions for storing and retrieving any Python (built-in or user-defined) object with a combination of json and jsonpickle (in 2.7)

def save(kind, obj):
    pickled = jsonpickle.encode(obj)
    filename = DATA_DESTINATION[kind] \\returns file destination to store json
    if os.path.isfile(filename):
        open(filename, 'w').close()
    with open(filename, 'w') as f:
        json.dump(pickled, f)

def retrieve(kind):
    filename = DATA_DESTINATION[kind] \\returns file destination to store json
    if os.path.isfile(filename):
        with open(filename, 'r') as f:
            pickled = json.load(f)
            unpickled = jsonpickle.decode(pickled)
            print unpickled

I haven't tested these two functions with user-defined objects, but when i attempt to save() a built-in dictionary of strings, (ie. {'Adam': 'Age 19', 'Bill', 'Age 32'}), and i retrieve the same file, i get the same dictionary back in unicode, {u'Adam': u'Age 19', u'Bill', u'Age 32'}. I thought json/jsonpickle encoded by default to utf-8, what's the deal here?

[UPDATE]: Removing all jsonpickle encoding/decoding does not effect output, still in unicode, seems like an issue with json? Perhaps I'm doing something wrong.

score 1 · Accepted Answer

You can encode the unicode sting after calling loads().

json.loads('"\\u79c1"').encode('utf-8')

Now you have a normal string again.

score 0 · Accepted Answer

問題は、シリアル化形式としてのjsonは、元のタイプの文字列に関する情報を伝達するのに十分な表現力がないことです。つまり、json文字列がある場合、それがPython文字列から発信されたものなのかPythonユニコード文字列からa発信されたものなのかがわかりません。"a"u"a"

実際、オプションについてはjsonモジュールのドキュメントをensure_ascii読むことができます。基本的に、生成されたjsonを書き込む場所に応じて、Unicode文字列を許容するか、すべての着信Unicode文字を適切にエスケープしたASCII文字列が必要になる場合があります。

例えば：

>>> import json
>>> json.dumps({'a':'b'})
'{"a": "b"}'
>>> json.dumps({'a':u'b'}, ensure_ascii=False)
u'{"a": "b"}'
>>> json.dumps({'a':u'b'})
'{"a": "b"}'
>>> json.dumps({u'a':'b'})
'{"a": "b"}'
>>> json.dumps({'a':u'\xe0'})
'{"a": "\\u00e0"}'
>>> json.dumps({'a':u'\xe0'}, ensure_ascii=False)
u'{"a": "\xe0"}'

ご覧のとおり、値に応じてensure_asciiASCII文字列またはUnicode文字列になりますが、元のオブジェクトのコンポーネントはすべて同じ共通エンコーディングにフラット化されます。{"a": "b"}特にケースを見てください。

jsonpickle基盤となるシリアル化エンジンとして単に利用しjson、元の文字列タイプを追跡するための追加のメタデータを追加しないため、実際には途中で情報が失われます。

>>> jsonpickle.encode({'a': 'b'})
'{"a": "b"}'
>>> jsonpickle.encode({'a': u'b'})
'{"a": "b"}'
>>> jsonpickle.encode({u'a': 'b'})
'{"a": "b"}'

score 0 · Accepted Answer

私はjson ...デフォルトでutf-8にエンコードされていると思っていましたが、ここで何をしていますか?

いいえ、ASCII にエンコードされます。そして、それはにデコードされunicodeます。

>>> json.dumps(u'私')
'"\\u79c1"'
>>> json.loads('"\\u79c1"')
u'\u79c1'

python - jsonpickle/json function input utf-8, output unicode?

4 に答える 4

Related

Reference