2

tm ライブラリを使用してテキストを肯定的な参照単語リストと比較し、肯定的な単語の出現回数を返す最善の方法は何ですか?参照テキスト内の肯定的な単語の合計を返すことができるようにしたいと考えています。

質問: これを行う最善の方法は何ですか?

例えば:

positiveword_list <- c("happy", "great", "fabulous", "great")

参照テキスト:

exampleText <- c("ON A BRIGHT SPRING DAY in the year 1677, “the good ship 
Kent,” Captain Gregory Marlowe, Master, set sail from the great docks of London. She carried 230 English Quakers, outward bound for a new home in British North America. As the ship dropped down the Thames she was hailed by King Charles II, who happened to be sailing on the river. The two vessels made a striking contrast. The King’s yacht was sleek and proud in gleaming paintwork, with small cannons peeping through wreaths of gold leaf, a wooden unicorn prancing high above her prow, and the royal arms emblazoned upon her stern. She seemed to dance upon the water— new sails shining white in the sun, flags streaming bravely from her mastheads, officers in brilliant uniform, ladies in court costume, servants in livery, musicians playing, and spaniels yapping. At the center of attention was the saturnine figure of the King himself in all his regal splendor. On the other side of the river came the emigrant ship. She would have been bluff-bowed and round-sided, with dirty sails and a salt-stained hull, and a single ensign drooping from its halyard. Her bulwarks were lined with apprehensive passengers— some dressed in the rough gray homespun of the northern Pen-nines, others in the brown drab of London tradesmen, several in the blue suits of servant-apprentices, and a few in the tattered motley of the country poor.")

ここにいくつかの背景があります:

私がやろうとしているのは、肯定的な作品の数を数え、その数をデータフレームに新しい列として保存することです。

count <-    length(which(lapply(positiveword_list, grepl, x = exampleText]) == TRUE))

したがって:

dataframeIn %>% mutate( posCount <- (length(which(lapply(positiveword_list, grepl, x = text) == TRUE)))) 

text は dataFrameIn の列です (つまり、dataFrameIn$text)

4

2 に答える 2

1

tmパッケージを使用せずにこれを行うことができます。

これを試して

contained <- lapply(positiveword_list, grepl, x = exampleText)

lapplyリストを返します。

現在の単語:

>positiveword_list[contained == T]
"great" "great"
>length(contained[contained==T])
2

存在しない単語:

>positiveword_list[contained == F]
"happy"    "fabulous"
>length(contained[contained==F])
2
于 2015-11-21T06:08:33.567 に答える