nlp - NLTK でテキストから関係を抽出する方法

Question

こんにちは、最後の 2 番目の例に基づいて、テキストの文字列から関係を抽出しようとしています: https://web.archive.org/web/20120907184244/http://nltk.googlecode.com/svn/trunk/doc /howto/relextract.html

「Publishers Weekly の Michael James 編集者」などの文字列から、次のような出力を得たいと考えています。

[PER: 'Michael James'] '、編集者' [ORG: 'Publishers Weekly']

これを行う最善の方法は何ですか？extract_rels が期待するフォーマットと、その要件を満たすために入力をフォーマットするにはどうすればよいですか?

自分でやってみましたがうまくいきませんでした。ここに私が本から適応させたコードがあります。結果が印刷されません。私は何を間違っていますか？

class doc():
 pass

doc.headline = ['this is expected by nltk.sem.extract_rels but not used in this script']

def findrelations(text):
roles = """
(.*(                   
analyst|
editor|
librarian).*)|
researcher|
spokes(wo)?man|
writer|
,\sof\sthe?\s*  # "X, of (the) Y"
"""
ROLES = re.compile(roles, re.VERBOSE)
tokenizedsentences = nltk.sent_tokenize(text)
for sentence in tokenizedsentences:
    taggedwords  = nltk.pos_tag(nltk.word_tokenize(sentence))
    doc.text = nltk.batch_ne_chunk(taggedwords)
    print doc.text
    for rel in relextract.extract_rels('PER', 'ORG', doc, corpus='ieer', pattern=ROLES):
        print relextract.show_raw_rtuple(rel) # doctest: +ELLIPSIS

text ="パブリッシャーズウィークリーのマイケルジェームズ編集者"

findrelations(テキスト)

score 4 · Accepted Answer

ここでは、うまく機能するあなたのコード（ほんの少しの調整）に基づくコードです;）

import nltk
import re 
from nltk.chunk import ne_chunk_sents
from nltk.sem import relextract


def findrelations(text):
    roles = """
    (.*(                   
    analyst|
    editor|
    librarian).*)|
    researcher|
    spokes(wo)?man|
    writer|
    ,\sof\sthe?\s*  # "X, of (the) Y"
    """
    ROLES = re.compile(roles, re.VERBOSE)

    sentences = nltk.sent_tokenize(text)
    tokenized_sentences = [nltk.word_tokenize(sentence) for sentence in sentences]
    tagged_sentences = [nltk.pos_tag(sentence) for sentence in tokenized_sentences]
    chunked_sentences = nltk.ne_chunk_sents(tagged_sentences)


    for doc in chunked_sentences:
        print doc
        for rel in relextract.extract_rels('PER', 'ORG', doc, corpus='ace', pattern=ROLES):
            #it is a tree, so you need to work on it to output what you want
            print relextract.show_raw_rtuple(rel) 

findrelations('Michael James editor of Publishers Weekly')

(S (PERSON Michael/NNP) (PERSON James/NNP) 編集者/NN of/IN (ORGANIZATION Publishers/NNS Weekly/NNP))

nlp - NLTK でテキストから関係を抽出する方法

1 に答える 1

Related

Reference