R
ライブラリtm
とを使用して、特定のテキスト内の単語を数えていますqdap
。ベクトル ( words
) に数語しかない場合、すべて問題ないように見えます。
library(tm)
library(qdap)
text <- "activat affect affected affecting affects aggravat allow attribut based basis
bc because bosses caus change changed changes changing compel compliance"
text <- Corpus(VectorSource(text))
words <- c("activat", "affect", "affected")
# Using termco to search for the words in the text
apply_as_df(text, termco, match.list=words)
# Results:
# docs word.count activat affect affected
# 1 doc 1 20 1(5.00%) 4(20.00%) 1(5.00%)
しかし、ベクトル ( words
) に含まれる単語が多すぎると、結果が文字化けして読めなくなります。
words <- c("activat", "affect", "affected", "affecting", "affects", "aggravat", "allow",
"attribut", "based", "basis", "bc", "because", "bosses", "caus", "change",
"changed", "changes", "changing", "compel", "compliance")
# Using termco to search for the words in the text
apply_as_df(text, termco, match.list=words)
# Results:
# docs word.count activat affect affected affecting affects aggravat allow
# attribut based basis bc because bosses caus change changed
# changes changing compel compliance
# 1 doc 1 20 1(5.00%) 4(20.00%) 1(5.00%) 1(5.00%) 1(5.00%) 1(5.00%) 1(5.00%)
# 1(5.00%) 1(5.00%) 1(5.00%) 1(5.00%) 1(5.00%) 1(5.00%) 2(10.00%) 3(15.00%) 1(5.00%)
# 1(5.00%) 1(5.00%) 1(5.00%) 1(5.00%)
結果をデータフレーム/マトリックスに表示して、より簡単に読み取れるようにするにはどうすればよいですか?
おそらく「用語カウントのマトリックスを返す」(https://trinker.github.io/qdap/termco.htmltermco2mat
)と思われる(ライブラリ)を使用してみました(以下を参照してください)が、エラーが発生します:qdap
apply_as_df(text, termco2mat, match.list=words)
# Results:
# Error in qdapfun(text.var = text, ...) :
# unused arguments (text.var = text, match.list = c("activat", "affect", "affected",
# "affecting", "affects", "aggravat", "allow", "attribut", "based", "basis", "bc",
# "because", "bosses", "caus", "change", "changed", "changes", "changing", "compel",
# "compliance"))
または:
termco2mat(apply_as_df(text, termco, match.list=words))
# Results:
# Error in `rownames<-`(`*tmp*`, value = "doc 1") :
# attempt to set 'rownames' on an object with no dimensions