r - R DocumentTermMatrix は 100 未満の結果を失います

翻译自：https://stackoverflow.com/questions/24388384 2014-06-24T13:49:11.327

462 次

用語の頻度を取得するためにコーパスを DocumentTermMatrix (略して DTM) にフィードしようとしていますが、DTM がすべての用語を保持していないことに気付きました。その理由はわかりません! 見てみな：

A<-c(" 95 94 89 91 90 102 103 100 101 98 99 97 110 108 109 106 107")
B<-c(" 95 94 89 91 90 102 103 100 101 98 99 97 110 108 109 106 107")
C<-Corpus(VectorSource(c(A,B)))
inspect(C)

>A corpus with 2 text documents
>
>The metadata consists of 2 tag-value pairs and a data frame
>Available tags are:
>  create_date creator 
>Available variables in the data frame are:
>  MetaID 
>
>[[1]]
> 95 94 89 91 90 102 103 100 101 98 99 97 110 108 109 106 107
>
>[[2]]
> 95 94 89 91 90 102 103 100 101 98 99 97 110 108 109 106 107

ここまでは順調ですね。

しかし今、私は C を DTM に入力しようとしましたが、反対側から出てきません! 見る：

> dtm<-DocumentTermMatrix(C)
> colnames(dtm)
>[1] "100" "101" "102" "103" "106" "107" "108" "109" "110"

100 未満のすべての結果はどこにありますか? それともどういうわけか2文字のものですか？私も試しました：

dtm<-DocumentTermMatrix(C,control=list(c(1,Inf)))

と

dtm<-TermDocumentMatrix(C,control=list(c(1,Inf)))

無駄に。何を与える？

r - R DocumentTermMatrix は 100 未満の結果を失います

1 に答える 1

Related

Reference