r - 「quanteda」パッケージで dfm 行列を連結

Question

異なる数の列と行を同時に含む2つのdfm行列を連結する方法はありますか? 追加のコーディングで実行できるため、アドホックコードには興味がありませんが、存在する場合は一般的でエレガントなソリューションに興味があります。

例：

dfm1 <- dfm(c(doc1 = "This is one sample text sample."), verbose = FALSE)
dfm2 <- dfm(c(doc2 = "Surprise! This is one sample text sample."), verbose = FALSE)
rbind(dfm1, dfm2)

エラーを出します。

「tm」パッケージは、箱から出してその dfm 行列を連結できます。私の目的には遅すぎます。

また、'quanteda' の 'dfm' は S4 クラスであることを思い出してください。

score 4 · Accepted Answer

最新バージョンを使用している場合は、「箱から出して」動作するはずです。

packageVersion("quanteda")
## [1] ‘0.9.6.9’

dfm1 <- dfm(c(doc1 = "This is one sample text sample."), verbose = FALSE)
dfm2 <- dfm(c(doc2 = "Surprise! This is one sample text sample."), verbose = FALSE)

rbind(dfm1, dfm2)
## Document-feature matrix of: 2 documents, 6 features.
## 2 x 6 sparse Matrix of class "dfmSparse"
##      is one sample surprise text this
## doc1  1   1      2        0    1    1
## doc2  1   1      2        1    1    1

?selectFeatureswhere is a dfm objectも参照してくださいfeatures(ヘルプファイルに例があります)。

追加：

rbindこれにより、列が一致する必要がある行列の通常の方法とは異なり、共通の機能セットで 2 つのテキストが正しく整列されることに注意してください。同じ理由で、異なる用語を持つ DocumentTermMatrix オブジェクトのtmrbind()パッケージでは実際には機能しません。

require(tm)
dtm1 <- DocumentTermMatrix(Corpus(VectorSource(c(doc1 = "This is one sample text sample."))))
dtm2 <- DocumentTermMatrix(Corpus(VectorSource(c(doc2 = "Surprise! This is one sample text sample."))))
rbind(dtm1, dtm2)
## Error in f(init, x[[i]]) : Numbers of columns of matrices must match.

これはほとんど理解できますが、繰り返される機能を複製しているようです:

as.matrix(rbind(c(dtm1, dtm2)))
##     Terms
## Docs one sample sample. text this surprise!
##    1   1      1       1    1    1         0
##    1   1      1       1    1    1         1

r - 「quanteda」パッケージで dfm 行列を連結

1 に答える 1

Related

Reference