0

私は groovy に非常に慣れていないので、いくつかの nlp に DKPro Core を使用しようとしています。この時点で、テキスト内の名前のフレーズを認識しようとしています。トークン文と名前付きエンティティを正しく認識できますが、何らかの理由で NP クラスでは同じことが機能しません。私のコードを以下に示します。間違いを指摘してください。

#!/usr/bin/env groovy
@Grab(group='de.tudarmstadt.ukp.dkpro.core', version='1.5.0',
      module='de.tudarmstadt.ukp.dkpro.core.stanfordnlp-gpl')
@Grab(group='de.tudarmstadt.ukp.dkpro.core',
    module='de.tudarmstadt.ukp.dkpro.core.io.text-asl',
    version='1.5.0')
@Grab(group='de.tudarmstadt.ukp.dkpro.core',
    module='de.tudarmstadt.ukp.dkpro.core.opennlp-asl',
    version='1.5.0')
@Grab(group='de.tudarmstadt.ukp.dkpro.core',
    module='de.tudarmstadt.ukp.dkpro.core.io.text-asl',
    version='1.5.0')
@Grab(group='de.tudarmstadt.ukp.dkpro.core',
    module='de.tudarmstadt.ukp.dkpro.core.stanfordnlp-gpl',
    version='1.5.0')

import org.apache.uima.analysis_engine.AnalysisEngineProcessException;
import org.apache.uima.fit.component.JCasConsumer_ImplBase;
import org.apache.uima.fit.util.JCasUtil;
import org.apache.uima.jcas.JCas;
import de.tudarmstadt.ukp.dkpro.core.api.ner.type.NamedEntity;
import de.tudarmstadt.ukp.dkpro.core.api.segmentation.type.Sentence;
import de.tudarmstadt.ukp.dkpro.core.api.syntax.type.constituent.NP;    
import de.tudarmstadt.ukp.dkpro.core.stanfordnlp.*;

import static org.apache.uima.fit.pipeline.SimplePipeline.*;
import static org.apache.uima.fit.factory.JCasFactory.*;
import static org.apache.uima.fit.factory.AnalysisEngineFactory.*;
import static org.apache.uima.fit.util.JCasUtil.*;

import de.tudarmstadt.ukp.dkpro.core.api.segmentation.type.*;
import de.tudarmstadt.ukp.dkpro.core.api.ner.type.*;


def doc = createJCas();
doc.documentText = """It is unfortunate that many Nigerians, especially the younger ones, 
express surprise at the mention of elephants and lions being found within the borders of the country. 
Admittedly, the number of these animals has diminished greatly over the years due to the activities of poachers thus pushing 
some of these animals to the verge of extinction. For example, it was discovered last year 
that there are not more than 34 lions in the wild. However there should be cause for 
optimism as a rundown of just a few animals across these parks show. The Yankari 
Game Reserve in Bauchi is Nigeria's most famous and arguably the best park for observing 
wildlife. Buffaloes, waterbucks, bushbucks, hyenas, leopards, baboons, elephants and lions 
are some of the animals that can be found here. 
"The animals are best seen during the dry season, 
especially from January to April," a 
tour guide told this reporter during a safari at Yankari. """
doc.documentLanguage = "en";

runPipeline(doc,
  createEngineDescription(StanfordSegmenter),
  createEngineDescription(StanfordPosTagger),
  createEngineDescription(StanfordNamedEntityRecognizer));

// for (Token token : select(doc, Token)) {  
    // println token.coveredText + "\n\n\n"
    // }
// for (Sentence sentence : select(doc, Sentence)) {  
    // println sentence.coveredText + "\n\n\n"
    // }
for (Sentence sentence : JCasUtil.select(doc, Sentence.class)) {
println sentence.getCoveredText()+"\n\n"
for (NP nounphrase : JCasUtil.selectCovered(doc, NP.class, sentence)) { 
    println "||" + nounphrase.getCoveredText() + "||\n\n"
    }
}   
// for (Token token : select(doc, Token)) { 
    // def entity=selectCovering(NamedEntity,token).value
    // if(entity.toString().length()>2)
    // println token.coveredText +"\n\n" + entity.toString() + "\n\n\n"
    // }

私の出力では、文は正しく認識されていますが、名前付き句については何も出力されていません。

4

1 に答える 1

1

NP は Constituency 構造の一部です。あなたのスクリプトには Constituency パーサーが含まれていません。スタンフォード パーサーなどのパーサーをパイプラインに追加すると、NP にもアクセスできます。

runPipeline(doc,
  createEngineDescription(StanfordSegmenter),
  createEngineDescription(StanfordPosTagger),
  createEngineDescription(StanfordParser),
  createEngineDescription(StanfordNamedEntityRecognizer));

開示: 私は DKPro Core プロジェクトの開発者です。

于 2014-05-27T20:22:56.417 に答える