java - スタンフォードcorenlpのテキストのxml出力を取得する方法

Question

API とドキュメントを読んで答えを見つけようとしましたが、問題の解決には至りませんでした。

一連の文を取得し、すべての文の出力を XML として取得したいと考えています。

      <token id="1"> 
        <word>That</word> 
        <lemma>that</lemma> 
        <CharacterOffsetBegin>0</CharacterOffsetBegin> 
        <CharacterOffsetEnd>4</CharacterOffsetEnd> 
        <POS>DT</POS> 
        <NER>O</NER> 
      </token>

ツリーを解析する方法を理解することしかできませんでしたが、それは私が構築したいものには役に立ちません。とにかく、ここに私が今使っているコードがあります：

Properties props = new Properties();
props.put("annotators", "tokenize, ssplit, pos, lemma, ner, parse, dcoref");
StanfordCoreNLP pipeline = new StanfordCoreNLP(props);

// read some text in the text variable
String text = "We won the game."; // Add your text here!

// create an empty Annotation just with the given text
Annotation document = new Annotation(text);

// run all Annotators on this text
pipeline.annotate(document);

// these are all the sentences in this document
// a CoreMap is essentially a Map that uses class objects as keys and has values with custom types
List<CoreMap> sentences = document.get(SentencesAnnotation.class);

for(CoreMap sentence: sentences) {

  // this is the parse tree of the current sentence
  Tree tree = sentence.get(TreeAnnotation.class);

  // this is the Stanford dependency graph of the current sentence
  SemanticGraph dependencies = sentence.get(CollapsedCCProcessedDependenciesAnnotation.class);
}

ドキュメントのコードを使用しています。

java - スタンフォードcorenlpのテキストのxml出力を取得する方法

3 に答える 3

Related

Reference