問題タブ [latent-semantic-analysis]
For questions regarding programming in ECMAScript (JavaScript/JS) and its various dialects/implementations (excluding ActionScript). Note JavaScript is NOT the same as Java! Please include all relevant tags on your question; e.g., [node.js], [jquery], [json], [reactjs], [angular], [ember.js], [vue.js], [typescript], [svelte], etc.
gensim - Which formula of tf-idf does the LSA model of gensim use?
There are many different ways in which tf and idf can be calculated. I want to know which formula is used by gensim in its LSA model. I have been going through its source code lsimodel.py
, but it is not obvious to me where the document-term matrix is created (probably because of memory optimizations).
In one LSA paper, I read that each cell of the document-term matrix is the log-frequency of that word in that document, divided by the entropy of that word:
However, this seems to be a very unusual formulation of tf-idf. A more familiar form of tf-idf is:
I also notice that there is a question on how the TfIdfModel
itself is implemented in gensim. However, I didn't see lsimodel.py
importing TfIdfModel
, and therefore can only assume that lsimodel.py
has its own implementation of tf-idf.
python - 教師なしコマンドの分類
Bag-Of-Words 表現を使用せずに/bin/busybox chmod 777 /dvrHelperなどのコマンドをクラスター化するにはどうすればよい ですか? LDAやWord2vecなどのモデルは、私の目標に役立つでしょうか?