Is there a way to get NLTK to return text fully marked with all Treebank clause and Treebank phrase demarcations (or equivalent; it need not be Treebank)? I need to be able to return both clauses and phrases (separately). The only thing on this that I have found is in the NLTK Bird/Klein/Loper book in chapter 7 where it says you can not process for noun phrases and verb phrases at the same time, but I want to do much more than that! I think the Stanford POS parser does this but the client wants to use only the NLTK. Thanks.
質問する
1161 次
1 に答える
1
チャプター8はもう見ましたか?次のようなものが必要なようです。
>>> from nltk.corpus import treebank
>>> t = treebank.parsed_sents('wsj_0001.mrg')[0]
>>> print t
(S
(NP-SBJ
(NP (NNP Pierre) (NNP Vinken))
(, ,)
(ADJP (NP (CD 61) (NNS years)) (JJ old))
(, ,))
(VP
(MD will)
(VP
(VB join)
(NP (DT the) (NN board))
(PP-CLR
(IN as)
(NP (DT a) (JJ nonexecutive) (NN director)))
(NP-TMP (NNP Nov.) (CD 29))))
(. .))
既に見つけたチャンキング リソースに加えて。ただし、指定したテキストを解析したい場合は、次のようなオプションもあります。
>>> sr_parse = nltk.ShiftReduceParser(grammar1)
>>> sent = 'Mary saw a dog'.split()
>>> print sr_parse.parse(sent)
(S (NP Mary) (VP (V saw) (NP (Det a) (N dog))))
ただし、これは grammar1 が事前に手動で入力されていることに依存しています。チャンクは解析よりも簡単です。
于 2012-08-15T02:05:14.473 に答える