Is there a way to get NLTK to return text fully marked with all Treebank clause and Treebank phrase demarcations (or equivalent; it need not be Treebank)? I need to be able to return both clauses and phrases (separately). The only thing on this that I have found is in the NLTK Bird/Klein/Loper book in chapter 7 where it says you can not process for noun phrases and verb phrases at the same time, but I want to do much more than that! I think the Stanford POS parser does this but the client wants to use only the NLTK. Thanks.
1161 次
1 に答える
>>> from nltk.corpus import treebank
>>> t = treebank.parsed_sents('wsj_0001.mrg')[0]
>>> print t
(NP (NNP Pierre) (NNP Vinken))
(, ,)
(ADJP (NP (CD 61) (NNS years)) (JJ old))
(, ,))
(MD will)
(VB join)
(NP (DT the) (NN board))
(IN as)
(NP (DT a) (JJ nonexecutive) (NN director)))
(NP-TMP (NNP Nov.) (CD 29))))
(. .))
既に見つけたチャンキング リソースに加えて。ただし、指定したテキストを解析したい場合は、次のようなオプションもあります。
>>> sr_parse = nltk.ShiftReduceParser(grammar1)
>>> sent = 'Mary saw a dog'.split()
>>> print sr_parse.parse(sent)
(S (NP Mary) (VP (V saw) (NP (Det a) (N dog))))
ただし、これは grammar1 が事前に手動で入力されていることに依存しています。チャンクは解析よりも簡単です。
于 2012-08-15T02:05:14.473 に答える