r - PlainTextDocuments のベクトルを作成

Question

これは、メタデータを転送および拡張しながら、 1 つPlainTextDocumentを複数の子に解析することを目的としています。PlainTextDocument

segment_doc <- function(doc) {
    txt = paste0(doc, collapse=' ')
    au <- meta(doc, tag='Author');
    desc <- meta(doc, tag='description');
    ori <- meta(doc, tag='origin');
    locmeta <- attr(doc,'LocalMetaData');

    function(df){
        dfrows <- nrow(df);
        v<-rep(NA,dfrows);
        for(i in 1:dfrows) {
            a <- df[i,'after'];
            b <- df[i, 'before'];
            m <- df[i, 'meta'];

            sec <-PlainTextDocument(mkmeta(b, a, txt), author= au, description=desc, origin=ori, heading = m, localmetadata= locmeta) 
            #verified using debug that sec is a 'PlainTextDocument' with the expected text and metadata 
            v[i]=sec;

        }
        v #should be a vector of PlainTextDocuments, BUT it is vector of character vectors. WHY??

    }
}

次のように使用できます。

# mycorpus is a Corpus object containing PlainTextDocuments
# sections is a data.frame with 3 columns of type character named 'before', 'after' and 'meta' and 6 rows

sectioner <- segment_doc(mycorpus[[1]]); 
ptv <- sectioner(sections); #expect a vector of 6 PlainTextDocuments

class(ptv);
[1] "character" 
length(ptv);
[1] 6

質問

ベクトルに配置すると、オブジェクトから文字ベクトルにsec変換されるのはなぜですか?PlainTextDocument
どうすればオブジェクトをsectioner返すことができますか? Corpus( PlainTextDocuments のベクトルでも問題ありません)。

のドキュメントを読みましたtm。はい、全部です。これはそれほど難しいことではありません。私が使用すべき別のアプローチはありますか？

score 2 · Accepted Answer

sectioner文字ベクトルを返す理由は、v複雑なオブジェクトを運ぶことができないアトミックベクトルとして初期化するためです。代わりに、ベクトルに入れられるオブジェクトは、一般的な原子データ型 (ここでは文字) に強制されます。vをリストとして初期化できます

v <- vector( length = dfrows, mode= 'list' ).

r - PlainTextDocuments のベクトルを作成

質問

1 に答える 1

Related

Reference