java - Stanford Parser を使用して文の K 個の最適な解析を取得する

Question

文の K 個の最適な解析が必要です。これは ExhaustivePCFGParser Class で実行できると考えました。問題は、このクラスの使用方法がわからないことです。より正確には、このクラスをインスタンス化できますか? (コンストラクターは次のとおりです: ExhaustivePCFGParser(BinaryGrammar bg, UnaryGrammar ug, Lexicon lex, Options op, Index stateIndex, Index wordIndex, Index tagIndex) ) しかし、このすべてのパラメーターに適合する方法がわかりません

K個の最適な解析を行う簡単な方法はありますか?

score 2 · Accepted Answer

LexicalizedParser一般に、これらすべてのもの (文法、語彙、索引など) を提供する「文法」であるオブジェクトを介して物事を行います。

コマンドラインから、次のように動作します。

java -mx500m -cp "*" edu.stanford.nlp.parser.lexparser.LexicalizedParser -printPCFGkBest 20 edu/stanford/nlp/models/lexparser/englishPCFG.ser.gz data/testsent.txt

API レベルでは、オブジェクトを取得する必要がありLexicalizedParserQueryます。LexicalizedParser lp( のように) を取得したらParserDemo.java、次の操作を実行できます。

LexicalizedParser lp = ... // Load / train a model
LexicalizedParserQuery lpq = lp.parserQuery();
lpq.parse(sentence);
List<ScoredObject<Tree>> kBest = lpq.getKBestPCFGParses(20);

ALexicalizedParserQueryは Java regex と同等Matcherです。

注: 現在、kBest 解析は因数分解されていない PCFG 文法に対してのみうまく機能します。

score 0 · Accepted Answer

これは、Python を使用することを前提として、上記の Christopher Manning の回答に基づいて実装した回避策です。CoreNLP の Python ラッパーには「K-best 解析ツリー」が実装されていないため、代わりに端末コマンドを使用します。

java -mx500m -cp "*" edu.stanford.nlp.parser.lexparser.LexicalizedParser -printPCFGkBest 20 edu/stanford/nlp/models/lexparser/englishPCFG.ser.gz data/testsent.txt

すべての JAR ファイルがディレクトリにダウンロードされ、前提条件の Python ライブラリがインストールされているスタンフォード CoreNLP が必要であることに注意してください (インポートステートメントを参照)。

import os
import subprocess
import nltk
from nltk.tree import ParentedTree

ip_sent = "a quick brown fox jumps over the lazy dog."

data_path = "<Your path>/stanford-corenlp-full-2018-10-05/data/testsent.txt" # Change the path of working directory to this data_path
with open(data_path, "w") as file:
    file.write(ip_sent) # Write to the file specified; the text in this file is fed into the LexicalParser

os.chdir("/home/user/Sidney/Vignesh's VQA/SpElementEx/extLib/stanford-corenlp-full-2018-10-05") # Change the working directory to the path where the JAR files are stored
terminal_op = subprocess.check_output('java -mx500m -cp "*" edu.stanford.nlp.parser.lexparser.LexicalizedParser -printPCFGkBest 5 edu/stanford/nlp/models/lexparser/englishPCFG.ser.gz data/testsent.txt', shell = True) # Run the command via the terminal and capture the output in the form of bytecode
op_string = terminal_op.decode('utf-8') # Convert to string object 
parse_set = re.split("# Parse [0-9] with score -[0-9][0-9].[0-9]+\n", op_string) # Split the output based on the specified pattern 
print(parse_set)

# Print the parse trees in a pretty_print format
for i in parse_set:
    parsetree = ParentedTree.fromstring(i)
    print(type(parsetree))
    parsetree.pretty_print()

お役に立てれば。

java - Stanford Parser を使用して文の K 個の最適な解析を取得する

2 に答える 2

Related

Reference