0

Is there a way to get NLTK to return text fully marked with all Treebank clause and Treebank phrase demarcations (or equivalent; it need not be Treebank)? I need to be able to return both clauses and phrases (separately). The only thing on this that I have found is in the NLTK Bird/Klein/Loper book in chapter 7 where it says you can not process for noun phrases and verb phrases at the same time, but I want to do much more than that! I think the Stanford POS parser does this but the client wants to use only the NLTK. Thanks.

4

1 に答える 1

1

チャプター8はもう見ましたか?次のようなものが必要なようです。

>>> from nltk.corpus import treebank
>>> t = treebank.parsed_sents('wsj_0001.mrg')[0]
>>> print t
(S
  (NP-SBJ
    (NP (NNP Pierre) (NNP Vinken))
    (, ,)
    (ADJP (NP (CD 61) (NNS years)) (JJ old))
    (, ,))
  (VP
    (MD will)
    (VP
      (VB join)
      (NP (DT the) (NN board))
      (PP-CLR
        (IN as)
        (NP (DT a) (JJ nonexecutive) (NN director)))
      (NP-TMP (NNP Nov.) (CD 29))))
  (. .))

既に見つけたチャンキング リソースに加えて。ただし、指定したテキストを解析したい場合は、次のようなオプションもあります。

>>> sr_parse = nltk.ShiftReduceParser(grammar1)
>>> sent = 'Mary saw a dog'.split()
>>> print sr_parse.parse(sent)
(S (NP Mary) (VP (V saw) (NP (Det a) (N dog))))

ただし、これは grammar1 が事前に手動で入力されていることに依存しています。チャンクは解析よりも簡単です。

于 2012-08-15T02:05:14.473 に答える