java - Open nlp のチャンキングパーサーを使用して名詞句を抽出する方法

Question

私は自然言語処理の初心者です.テキストから名詞句を抽出する必要があります.これまでのところ、テキストを解析してツリー構造を取得するためにopen nlpのチャンキングパーサーを使用しました.しかし、私はから名詞句を抽出することができません.ツリー構造、開いている nlp に正規表現パターンがあるので、それを使用して名詞句を抽出できます。

以下は私が使用しているコードです

    InputStream is = new FileInputStream("en-parser-chunking.bin");
    ParserModel model = new ParserModel(is);
    Parser parser = ParserFactory.create(model);
    Parse topParses[] = ParserTool.parseLine(line, parser, 1);
        for (Parse p : topParses){
                 p.show();}

ここで、出力を次のように取得しています

(TOP (S (S (ADJP (JJ ウェルカム)) (PP (TO to) (NP (NNP Big) (NNP Data.))))) (S (NP (PRP We)) (VP (VP (VBP are) (VP (VBG working) (PP (IN on) (NP (NNP Natural) (NNP Language) (NNP Processing.can))))) (NP (DT some) (CD one) (NN help)) (NP ( PRP us)) (PP (IN in) (S (VP (VBG 抽出) (NP (DT the) (NN 名詞) (NNS 句))) (PP (IN from) (NP (DT the) (NN tree)) ( WP構造。))))))))))

NP、NNP、NN などの名詞句を取得するのを手伝ってもらえますか?

これについて私を助けてください。

前もって感謝します

ガセ。

score 6 · Accepted Answer

オブジェクトはParseツリーです。getParent()とgetChildren()とを使用しgetType()てツリーをナビゲートできます。

List<Parse> nounPhrases;

public void getNounPhrases(Parse p) {
    if (p.getType().equals("NP")) {
         nounPhrases.add(p);
    }
    for (Parse child : p.getChildren()) {
         getNounPhrases(child);
    }
}

score 4 · Accepted Answer

名詞句のみが必要な場合は、ツリーパーサーではなくセンテンスチャンカーを使用してください。コードはこのようなものです（パーサーモデルを取得したのと同じ場所からモデルを取得する必要があります）

public void chunk() {
    InputStream modelIn = null;
    ChunkerModel model = null;

    try {
      modelIn = new FileInputStream("en-chunker.bin");
      model = new ChunkerModel(modelIn);
    }
    catch (IOException e) {
      // Model loading failed, handle the error
      e.printStackTrace();
    }
    finally {
      if (modelIn != null) {
        try {
          modelIn.close();
        }
        catch (IOException e) {
        }
      }
    }

//After the model is loaded a Chunker can be instantiated.


    ChunkerME chunker = new ChunkerME(model);



    String sent[] = new String[]{"Rockwell", "International", "Corp.", "'s",
      "Tulsa", "unit", "said", "it", "signed", "a", "tentative", "agreement",
      "extending", "its", "contract", "with", "Boeing", "Co.", "to",
      "provide", "structural", "parts", "for", "Boeing", "'s", "747",
      "jetliners", "."};

    String pos[] = new String[]{"NNP", "NNP", "NNP", "POS", "NNP", "NN",
      "VBD", "PRP", "VBD", "DT", "JJ", "NN", "VBG", "PRP$", "NN", "IN",
      "NNP", "NNP", "TO", "VB", "JJ", "NNS", "IN", "NNP", "POS", "CD", "NNS",
      "."};

    String tag[] = chunker.chunk(sent, pos);
  }

次に、必要なタイプのタグ配列を見てください

http://opennlp.apache.org/documentation/1.5.3/manual/opennlp.html#tools.parser.chunking.api

score 2 · Accepted Answer

コード自体から続行します。このプログラムブロックは、文中のすべての名詞句を提供します。getTagNodes()メソッドを使用して、トークンとそのタイプを取得します

Parse topParses[] = ParserTool.parseLine(line, parser, 1);
Parse words[]=null; //an array to store the tokens
//Loop thorugh to get the tag nodes
for (Parse nodes : topParses){
        words=nodes.getTagNodes(); // we will get a list of nodes
}

for(Parse word:words){
//Change the types according to your desired types
    if(word.getType().equals("NN") || word.getType().equals("NNP") || word.getType().equals("NNS")){
            System.out.println(word);
            }
        }

java - Open nlp のチャンキング パーサーを使用して名詞句を抽出する方法

3 に答える 3

Related

Reference

java - Open nlp のチャンキングパーサーを使用して名詞句を抽出する方法