python - gensim pythonでgoogle word2vec .binファイルを操作する

Question

Google word2vec サイト (freebase-vectors-skipgram1000.bin.gz) から事前トレーニング済みの .bin ファイルを word2vec の gensim 実装にロードすることから始めようとしています。モデルは正常にロードされ、

を使用して..

model = word2vec.Word2Vec.load_word2vec_format('...../free....-en.bin', binary= True)

を作成し、

>>> print model
<gensim.models.word2vec.Word2Vec object at 0x105d87f50>

しかし、最も似た機能を実行すると。語彙に単語が見つかりません。私のエラーコードは以下です。

私が間違っているアイデアはありますか？

>>> model.most_similar(['girl', 'father'], ['boy'], topn=3)
2013-10-11 10:22:00,562 : WARNING : word ‘girl’ not in vocabulary; ignoring it
2013-10-11 10:22:00,562 : WARNING : word ‘father’ not in vocabulary; ignoring it
2013-10-11 10:22:00,563 : WARNING : word ‘boy’ not in vocabulary; ignoring it
Traceback (most recent call last):
File “”, line 1, in
File “/....../anaconda/python.app/Contents/lib/python2.7/site-packages/gensim-0.8.7/py2.7.egg/gensim/models/word2vec.py”, line 312, in most_similar
raise ValueError(“cannot compute similarity with no input”)
ValueError: cannot compute similarity with no input

score 7 · Accepted Answer

'...../free....-en.bin' の単語は次の形式です。

en/boardwalk_chapel en/mutsu_munemitsu en/goffstown en/yaw_axis en/john_e_fogarty_international_center en/francielle_manoel_alberto en/shinji_harada

だから「女の子」を探してもそこにはありません

python - gensim pythonでgoogle word2vec .binファイルを操作する

2 に答える 2

Related

Reference