r - R tm stemCompletion は NA 値を生成します

Question

コーパスに stemCompletion を適用しようとすると、この関数は NA 値を生成します。

これは私のコードです:

my.corpus <- tm_map(my.corpus, removePunctuation) 
my.corpus <- tm_map(my.corpus, removeWords, stopwords("english"))

(この結果の 1 つ: [[2584]] ゾーニング計画)

次のステップはコーパスのスタミングです。

my.corpus <- tm_map(my.corpus, stemDocument, language="english")
my.corpus <- tm_map(my.corpus, stemCompletion, dictionary=my.corpus_copy, type="first")

しかし、結果はこれです

[[2584]]NAプラント

次のステップでは、トランザクションとアプリオリルールを使用して発生マトリックスを作成する必要がありますが、続けてルールを取得しようとすると、inspect(rules) 関数で次のエラーが発生します。

> inspect(rules)
Errore in UseMethod("inspect", x) : 
no applicable method for 'inspect' applied to an object of class "c('rules','associations')"

どうしたの？NA 値が正しく発生行列を生成せず、適切なルールが生成されないと思います..これは問題ですか? もしそうなら、どうすれば解決できますか？

これは問題の要約です：

this is an abstract:

my.words = c("β cell","zoning policy regional index brazil","zoning plan","zolpidem  adult","zizyphus spinosa hu")
my.corpus = Corpus(VectorSource(my.words))
my.corpus_copy = my.corpus
my.corpus = tm_map(my.corpus, removePunctuation)
my.corpus = tm_map(my.corpus, removeWords, c("the", stopwords("english"))) 
my.corpus = tm_map(my.corpus, stemDocument, language="english")
my.corpus <- tm_map(my.corpus, stemCompletion, dictionary=my.corpus_copy, type="first")
inspect(my.corpus)

score 2 · Accepted Answer

現時点でのstemCompletion()は、元のコーパスが辞書パラメーターとして使用されている場合、ステミングプロセスのおおよその逆にすぎません。grep()を使用して、現在の語幹抽出された単語を含むすべての単語を辞書で検索し、「<strong>type」に基づいてこれらの単語の 1 つを使用して補完します。

したがって、ステミングプロセスがステミングされていない単語の部分文字列ではない単語を返した場合は失敗します。たとえば、 'c('delivery', 'zoning') の語幹は、 stemDocument()で使用されるwordStem ()によって返される c('deliveri', 'zone')です。ただし、どちらの場合も、語幹のある単語は、語幹のない単語の適切な部分文字列ではありません。したがって、stemCompletion()は置換を見つけられず、NA を返します。

この問題を解決するには、 stemCompletion() から戻った後に NA をステミングされた単語に置き換えるか、stemCompletion()関数自体をより適切に変更するなど、多くの代替手段があります。NA の代わりにステミングされた単語を保持するように変更する簡単な方法は、独自のバージョンのstemCompletion_modified()を用意することです: (... をtm パッケージのstemCompletion()関数の元のコードに置き換えます)

stemCompletion_modified <- function (x, dictionary, type = ...) 
{
  ...
  #possibleCompletions <- lapply(x, function(w) grep(sprintf("^%s", w), dictionary, value = TRUE))
  possibleCompletions <- lapply(x, function(w) ifelse(identical(grep(sprintf("^%s", w), dictionary, value = TRUE),character(0)),w,grep(sprintf("^%s", w), dictionary, value = TRUE)))
  ...
}

r - R tm stemCompletion は NA 値を生成します

1 に答える 1

Related

Reference