0

私はこのデータフレームを持っています:

> str(final)
'data.frame':   112 obs. of  3 variables:
 $ FAO_CountryName: chr  Algeria  Egypt  Libya  Morocco ...
 $ FAO_CountryURL : chr  "http://www.fao.org/giews/countrybrief/country.jsp?code=DZA" "http://www.fao.org/giews/countrybrief/country.jsp?code=EGY" "http://www.fao.org/giews/countrybrief/country.jsp?code=LBY" "http://www.fao.org/giews/countrybrief/country.jsp?code=MAR" ...
 $ Text           : chr  "\r\n   Reference Date: 24-November-2016\r\n   \r\n   \r\n               FOOD SECURITY SNAPSHOT\r\n               \r\n          "| __truncated__ "\r\n   Reference Date: 28-November-2016\r\n   \r\n   \r\n               FOOD SECURITY SNAPSHOT\r\n               \r\n          "| __truncated__ "\r\n   Reference Date: 15-November-2016\r\n   \r\n   \r\n               FOOD SECURITY SNAPSHOT\r\n               \r\n          "| __truncated__ "\r\n   Reference Date: 21-September-2016\r\n   \r\n   \r\n               FOOD SECURITY SNAPSHOT\r\n               \r\n         "| __truncated__ ...

たとえば、行ごとに単語が何回出現するかを数えることができるように、Text 変数に取り組みたいと思います。つまり、次のようなデータ フレームを取得したいと考えています。

> head(final, n=2)
  FAO_CountryName   FAO_CountryURL             Text                    WordCount 
  Algeria            http://www.fao.org…       Algeria is nice…          Algeria  1 
                                                                              is  1
                                                                             ...
  Egypt              http://www.fao.org…       Egypt is nice too…          Egypt    1  
                                                                              is    5                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                 
                                                                              ...

それでも、私はこれをしました:

## Counting the words included in the textual dataset.
   keywords <- text_df %>% 
   unnest_tokens(word, text) %>% 
   count(word, sort = TRUE) %>%
   ungroup()

## Scoring the textual frequencies into the textual dataset (i.e. how many times the words are present)
   total_words <- keywords %>% 
   group_by(word) %>% 
   summarize(total = sum(n))

それにもかかわらず、この方法では、行ごとではなく、すべての列の単語数のみを取得します。なにか提案を?

4

0 に答える 0