python - Spacy NLP - 正規表現によるチャンキング

翻译自：https://stackoverflow.com/questions/40716419 2016-11-21T09:13:01.057

1764 次

Spacy には、noun_chunks名詞句のセットを取得する機能が含まれています。関数english_noun_chunks（以下に添付）は使用しますword.pos == NOUN

def english_noun_chunks(doc):
    labels = ['nsubj', 'dobj', 'nsubjpass', 'pcomp', 'pobj',
              'attr', 'root']
    np_deps = [doc.vocab.strings[label] for label in labels]
    conj = doc.vocab.strings['conj']
    np_label = doc.vocab.strings['NP']
    for i in range(len(doc)):
        word = doc[i]
        if word.pos == NOUN and word.dep in np_deps:
            yield word.left_edge.i, word.i+1, np_label
        elif word.pos == NOUN and word.dep == conj:
            head = word.head
            while head.dep == conj and head.head.i < head.i:
                head = head.head
            # If the head is an NP, and we're coordinated to it, we're an NP
            if head.dep in np_deps:
                yield word.left_edge.i, word.i+1, np_label

正規表現を維持する文からチャンクを取得したいと思います。たとえば、0 個以上の形容詞の後に 1 個以上の名詞が続く I 句。

{(<JJ>)*(<NN | NNS | NNP>)+}

english_noun_chunks関数をオーバーライドせずに可能ですか?

python - Spacy NLP - 正規表現によるチャンキング

1 に答える 1

Related