r - 単語の頻度を計算するための用語ドキュメントマトリックスでのラップリーの使用

Question

3 つの TermDocumentMatrix、text1、text2、および text3 が与えられた場合、それぞれの単語頻度をデータフレームに計算し、すべてのデータフレームを rbind したいと思います。3 つはサンプルです。実際には数百あるので、これを機能化する必要があります。

1 つの TDM の単語頻度を計算するのは簡単です。

apply(x, 1, sum)

また

rowSums(as.matrix(x))

TDM のリストを作成したい:

tdm_list <- Filter(function(x) is(x, "TermDocumentMatrix"), mget(ls()))

それぞれの単語周波数を計算し、データフレームに入れます。

data.frame(lapply(tdm_list, sum)) # this is wrong. it simply sums frequency of all words instead of frequency by each word.

そして、それをすべて rbind します:

do.call(rbind, df_list)

TDM で lapply を使用して単語の頻度を計算する方法がわかりません。

サンプルデータを追加して遊んでみましょう:

require(tm)
text1 <- c("apple" , "love", "crazy", "peaches", "cool", "coke", "batman", "joker")
text2 <- c("omg", "#rstats" , "crazy", "cool", "bananas", "functions", "apple")
text3 <- c("Playing", "rstats", "football", "data", "coke", "caffeine", "peaches", "cool")

tdm1 <- TermDocumentMatrix(Corpus(VectorSource(text1)))
tdm2 <- TermDocumentMatrix(Corpus(VectorSource(text2)))
tdm3 <- TermDocumentMatrix(Corpus(VectorSource(text3)))

score 2 · Accepted Answer

わかりました、私はそれを持っていると思います。これは実際に同じことをしようとしている人を助けるかもしれません. 結局シンプルでした。

combineddf <- do.call(rbind, lapply(tdm_list, function (x) {
 data.frame(apply(x, 1, sum))
}))

上記は TermDocumentMatrices のリストを取得し、データフレーム内のそれらすべての単語数を示し、すべてを rbinds します。

r - 単語の頻度を計算するための用語ドキュメント マトリックスでのラップリーの使用

1 に答える 1

Related

Reference

r - 単語の頻度を計算するための用語ドキュメントマトリックスでのラップリーの使用