python - nltk.data.loadでenglish.pickleの読み込みに失敗しました

Question

トークナイザーを読み込もうとするとpunkt...

import nltk.data
tokenizer = nltk.data.load('nltk:tokenizers/punkt/english.pickle')

...aLookupErrorが提起されました:

> LookupError: 
>     *********************************************************************   
> Resource 'tokenizers/punkt/english.pickle' not found.  Please use the NLTK Downloader to obtain the resource: nltk.download().   Searched in:
>         - 'C:\\Users\\Martinos/nltk_data'
>         - 'C:\\nltk_data'
>         - 'D:\\nltk_data'
>         - 'E:\\nltk_data'
>         - 'E:\\Python26\\nltk_data'
>         - 'E:\\Python26\\lib\\nltk_data'
>         - 'C:\\Users\\Martinos\\AppData\\Roaming\\nltk_data'
>     **********************************************************************

score 296 · Accepted Answer

私はこれと同じ問題を抱えていました。Pythonシェルに入り、次のように入力します。

>>> import nltk
>>> nltk.download()

次に、インストールウィンドウが表示されます。[モデル]タブに移動し、[識別子]列の下から[パンク]を選択します。次に、[ダウンロード]をクリックすると、必要なファイルがインストールされます。その後、それは動作するはずです！

score 113 · Accepted Answer

このエラーが表示される主な理由は、nltk がpunktパッケージを見つけられなかったことです。スイートのサイズが大きいため、nltkインストール時に利用可能なすべてのパッケージがデフォルトでダウンロードされるわけではありません。

punktこのようなパッケージをダウンロードできます。

import nltk
nltk.download('punkt')

from nltk import word_tokenize,sent_tokenize

これは、最近のバージョンのエラーメッセージでも推奨されています。

LookupError: 
**********************************************************************
  Resource punkt not found.
  Please use the NLTK Downloader to obtain the resource:

  >>> import nltk
  >>> nltk.download('punkt')
  
  Searched in:
    - '/root/nltk_data'
    - '/usr/share/nltk_data'
    - '/usr/local/share/nltk_data'
    - '/usr/lib/nltk_data'
    - '/usr/local/lib/nltk_data'
    - '/usr/nltk_data'
    - '/usr/lib/nltk_data'
    - ''
**********************************************************************

関数に引数を渡さない場合は、すべてのdownloadパッケージchunkers、つまり、、、、、、がダウンロードされます。grammarsmiscsentimenttaggerscorporahelpmodelsstemmerstokenizers

nltk.download()

上記の関数は、パッケージを特定のディレクトリに保存します。ここのコメントからそのディレクトリの場所を見つけることができます。https://github.com/nltk/nltk/blob/67ad86524d42a3a86b1f5983868fd2990b59f1ba/nltk/downloader.py#L1051

score 28 · Accepted Answer

これは今私のために働いたものです：

# Do this in a separate python interpreter session, since you only have to do it once
import nltk
nltk.download('punkt')

# Do this in your ipython notebook or analysis script
from nltk.tokenize import word_tokenize

sentences = [
    "Mr. Green killed Colonel Mustard in the study with the candlestick. Mr. Green is not a very nice fellow.",
    "Professor Plum has a green plant in his study.",
    "Miss Scarlett watered Professor Plum's green plant while he was away from his office last week."
]

sentences_tokenized = []
for s in sentences:
    sentences_tokenized.append(word_tokenize(s))

samples_tokenized は、トークンのリストのリストです。

[['Mr.', 'Green', 'killed', 'Colonel', 'Mustard', 'in', 'the', 'study', 'with', 'the', 'candlestick', '.', 'Mr.', 'Green', 'is', 'not', 'a', 'very', 'nice', 'fellow', '.'],
['Professor', 'Plum', 'has', 'a', 'green', 'plant', 'in', 'his', 'study', '.'],
['Miss', 'Scarlett', 'watered', 'Professor', 'Plum', "'s", 'green', 'plant', 'while', 'he', 'was', 'away', 'from', 'his', 'office', 'last', 'week', '.']]

文は、本「Mining the Social Web, 2nd Edition」に付属の ipython ノートブックの例から取られました。

score 13 · Accepted Answer

これは私のために働く：

>>> import nltk
>>> nltk.download()

Windowsでは、nltkダウンローダーも取得します

score 6 · Accepted Answer

Spyder で、アクティブなシェルに移動し、以下の 2 つのコマンドを使用して nltk をダウンロードします。import nltk nltk.download() 次に、以下のように NLTK ダウンローダーウィンドウが開きます。このウィンドウの [モデル] タブに移動し、[punkt] をクリックして [punkt] をダウンロードします。

score 4 · Accepted Answer

nltkでposタグ付けをしようとしたときに、この問題に遭遇しました。私がそれを正しくした方法は、「taggers」という名前のcorporaディレクトリと一緒に新しいディレクトリを作成し、ディレクトリtaggersにmax_pos_taggerをコピーすることです。
それがあなたにとってもうまくいくことを願っています。頑張ってください!!!.

python - nltk.data.loadでenglish.pickleの読み込みに失敗しました

18 に答える 18

Related

Reference