python - コーパスカテゴリ内の単語のコンテキストを検索できない

Question

この小さなスクリプトを作成して、コーパスで最も頻繁に使用される 10 の単語のコンテキストを見つけました。しかし、それは機能せず、何が間違っているのかわかりません.tien_frequentste(mijn_corpus)の定義はそれ自体で機能します。

tienfrequentste = tien_frequentste(mijncorpus)
def context (corpus, most_freq):
   for category in corpus.categories():
     print "Context voor" , category, ":"
       for word in most_freq:
           print nltk.Text(corpus.words(categories=category)).concordance(word)

更新: 、 for 、 for およびの
トレースバックでエラーメッセージが表示されます。そして。これらのエラーの意味がわかりません..
context(corpus, most_freq)
category in corpus.categories()
self.init()
in_initAttributeError:'NoneType' object has no attribute 'group'

Traceback (most recent call last):
 File "/Users/...document.py", line 92, in <module> context (mijn_corpus, tienfrequentste)

 File "/Users/...document.py", line 87, in context for category in corpus.categories(): File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/nltk.corpus.reader.api.py, line 317, in categories self.init().

File "/Users/...document.py", line 87, in context for category in corpus.categories(): File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/nltk.corpus.reader.api.py, line 289, in_init category = re.match(self._pattern, file id).group(1)

attributeError: 'Nonetype' object has no attribute "group"

score 0 · Accepted Answer

コーパスにはカテゴリがありmost_freq、文字列のリストですか？次の例が機能します。

from nltk.corpus import reuters
for category in reuters.categories():
print "context voor", category, " : "
for word in ["get", "have", "do"]:
    print nltk.Text(reuters.words(categories=category)).concordance(word)

score 0 · Accepted Answer

エラーは、コーパスファイルをカテゴリに割り当てる正規表現が原因で発生します。正規表現パターンと一致しないファイル名に遭遇しています。カテゴリ付きの標準NLTKコーパスを使用している場合は、コーパスディレクトリに追加のファイルを配置しておく必要があります。独自のコーパスを使用している場合は、構成が正しくありません。

ちなみに、concordance()その出力を出力して返しますNone。一緒に使用printすると、たくさんのNone値が表示されます。

python - コーパス カテゴリ内の単語のコンテキストを検索できない

2 に答える 2

Related

Reference

python - コーパスカテゴリ内の単語のコンテキストを検索できない