Linux で LDA を実行したところ、トピック 2 の「ø」などの文字は表示されませんでしたが、Windows で実行すると表示されます。誰もこれに対処する方法を知っていますか? パッケージquanteda
とtopicmodels
.
> terms(LDAModel1,5)
Topic 1 Topic 2
[1,] "car" "ø"
[2,] "build" "ù"
[3,] "work" "network"
[4,] "drive" "ces"
[5,] "musk" "new"
編集:
データ: https://www.dropbox.com/s/tdr9yok7tp0pylz/technology201501.csv
コードは次のようなものです。
library(quanteda)
library(topicmodels)
myCorpus <- corpus(textfile("technology201501.csv", textField = "title"))
myDfm <- dfm(myCorpus,ignoredFeatures=stopwords("english"), stem = TRUE, removeNumbers = TRUE, removePunct = TRUE, removeSeparators = TRUE)
myDfm <-removeFeatures(myDfm, c("reddit", "redditors","redditor","nsfw", "hey", "vs", "versus", "ur", "they'r", "u'll", "u.","u","r","can","anyone","will","amp","http","just"))
sparsityThreshold <- round(ndoc(myDfm) * (1 - 0.9999))
myDfm2 <- trim(myDfm, minDoc = sparsityThreshold)
LDAModel1 <- LDA(quantedaformat2dtm(myDfm2), 25, 'Gibbs', list(iter=4000,seed = 123))