シリアル化されたコーパスをに保存できますfoobar.mm
が、読み込もうとするとUnpicklingError
. ただし、辞書のロードは問題ないようです。誰でもこれを解決する方法を知っていますか? そして、なぜこれが起こるのですか?
>>> from gensim import corpora
>>> docs = ["this is a foo bar", "you are a foo"]
>>> texts = [[i for i in doc.lower().split()] for doc in docs]
>>> print texts
[['this', 'is', 'a', 'foo', 'bar'], ['you', 'are', 'a', 'foo']]
>>> dictionary = corpora.Dictionary(texts)
>>> dictionary.save('foobar.dic')
>>> print dictionary
Dictionary(7 unique tokens)
>>> corpora.Dictionary.load('foobar.dic')
<gensim.corpora.dictionary.Dictionary object at 0x329f910>
>>> corpus = [dictionary.doc2bow(text) for text in texts]
>>> corpora.MmCorpus.serialize('foobar.mm', corpus)
>>> corpus = corpora.MmCorpus.load('foobar.mm')
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/usr/local/lib/python2.7/dist-packages/gensim-0.8.6-py2.7.egg/gensim/utils.py", line 166, in load
return unpickle(fname)
File "/usr/local/lib/python2.7/dist-packages/gensim-0.8.6-py2.7.egg/gensim/utils.py", line 492, in unpickle
return cPickle.load(open(fname, 'rb'))
cPickle.UnpicklingError: invalid load key, '%'.