nlp - スタンフォード CoreNLP Coreferences で Coreference セットと代表的な言及を識別する方法は?

Question

スタンフォード CoreNLP を使用しています。入力テキスト内の各 CorefChain の「Coreference セット」と「代表的な言及」を検出して識別する必要があります。

例: 入力: オバマは 1996 年にイリノイ州上院議員に選出され、そこで 8 年間務めました。2004 年にイリノイ州から上院議員に記録的な多数で選出され、2007 年 2 月に大統領への立候補を発表しました。

出力: 「Pretty Print」を使用すると、以下の出力が得られます。

**Coreference set:
(2,4,[4,5]) -> (1,1,[1,2]), that is: "he" -> "Obama"

(2,24,[24,25]) -> (1,1,[1,2]), that is: "his" -> "Obama"

(3,22,[22,23]) -> (1,1,[1,2]), that is: "Obama" -> "Obama"**

ただし、「コリファレンスセット」と呼ばれる上記の出力をプログラムで識別して検出する必要があります。(つまり、「彼」->「オバマ」のようなすべてのペアを特定する必要があります)

注：私の基本コードは以下のものです（http://stanfordnlp.github.io/CoreNLP/coref.htmlからのものです）：

import edu.stanford.nlp.hcoref.CorefCoreAnnotations;
import edu.stanford.nlp.hcoref.data.CorefChain;
import edu.stanford.nlp.hcoref.data.Mention;
import edu.stanford.nlp.ling.CoreAnnotations;
import edu.stanford.nlp.pipeline.Annotation;
import edu.stanford.nlp.pipeline.StanfordCoreNLP;
import edu.stanford.nlp.util.CoreMap;
import java.util.Properties;
public class CorefExample {

public static void main(String[] args) throws Exception {

Annotation document = new Annotation("Obama was elected to the Illinois state senate in 1996 and served there for eight years. In 2004, he was elected by a record majority to the U.S. Senate from Illinois and, in February 2007, announced his candidacy for President.");
Properties props = new Properties();
props.setProperty("annotators", "tokenize,ssplit,pos,lemma,ner,parse,mention,coref");
StanfordCoreNLP pipeline = new StanfordCoreNLP(props);
pipeline.annotate(document);
System.out.println("---");
System.out.println("coref chains");
for (CorefChain cc : document.get(CorefCoreAnnotations.CorefChainAnnotation.class).values()) {
  System.out.println("\t"+cc);
}
for (CoreMap sentence : document.get(CoreAnnotations.SentencesAnnotation.class)) {
  System.out.println("---");
  System.out.println("mentions");
  for (Mention m : sentence.get(CorefCoreAnnotations.CorefMentionsAnnotation.class)) {
    System.out.println("\t"+m);
     }
   }
  }
 }

 ///// Any Idea? THANK YOU in ADVANCE

score 2 · Accepted Answer

CorefChain にはその情報が含まれています。

たとえば、次を取得できます。

List<CorefChain.CorefMention>

この方法を使用して：

cc.getMentionsInTextualOrder();

これにより、その特定のクラスターのドキュメント内のすべての CorefChain.CorefMention が得られます。

この方法で代表的な言及を取得できます。

cc.getRepresentativeMention();

CorefChain.CorefMention は、coref クラスター内の特定の言及を表します。CorefChain.CorefMention (文番号、文中の言及番号) から完全な文字列や位置などの情報を取得できます。

for (CorefChain.CorefMention cm : cc.getMentionsInTextualOrder()) {
    String textOfMention = cm.mentionSpan;
    IntTuple positionOfMention = cm.position;
}

CorefChain の javadoc へのリンクは次のとおりです。

http://nlp.stanford.edu/nlp/javadoc/javanlp/edu/stanford/nlp/dcoref/CorefChain.html

CorefChain.CorefMention の javadoc へのリンクは次のとおりです。

http://nlp.stanford.edu/nlp/javadoc/javanlp/edu/stanford/nlp/dcoref/CorefChain.CorefMention.html

nlp - スタンフォード CoreNLP Coreferences で Coreference セットと代表的な言及を識別する方法は?

1 に答える 1

Related

Reference