python - Pythonでnltkを理解する

Question

私のnltkデータは~/nltk_data/corpora/words/(en,en-basic,README)

__init__.py内部によると~/lib/python2.7/site-packages/nltk/corpus、ブラウンコーパスの単語のリストを読むには、次を使用します nltk.corpus.brown.words()。

from nltk.corpus import brown
print brown.words()
['The', 'Fulton', 'County', 'Grand', 'Jury', 'said', ...]

これ__init__.pyには

words = LazyCorpusLoader(
    'words', WordListCorpusReader, r'(?!README|\.).*')

だから私が書くとき、ディレクトリにあるfrom nltk.corpus import words「単語」機能をインポートしていますか?__init__.pypython2.7/site-packages/nltk/corpus

また、なぜこれが起こるのですか：

 import nltk.corpus.words
 ImportError: No module named words
 from nltk.copus import words
 # WORKS FINE

「茶色の」コーパスは~/nltk_data/corpora(nltk/コーパスではなく) 内部にあります。では、なぜこのコマンドが機能するのでしょうか。
```
from nltk.corpus import brown
```
これでいいのではないですか？
```
from nltk_data.corpora import brown
```

score 0 · Accepted Answer

1.]はい - 次の説明を見つけることができる util の LazyCorpusLoader を使用します。

"""
    A proxy object which is used to stand in for a corpus object
    before the corpus is loaded.  This allows NLTK to create an object
    for each corpus, but defer the costs associated with loading those
    corpora until the first time that they're actually accessed.

    The first time this object is accessed in any way, it will load
    the corresponding corpus, and transform itself into that corpus
    (by modifying its own ``__class__`` and ``__dict__`` attributes).

    If the corpus can not be found, then accessing this object will
    raise an exception, displaying installation instructions for the
    NLTK data package.  Once they've properly installed the data
    package (or modified ``nltk.data.path`` to point to its location),
    they can then use the corpus object without restarting python.
    """

3.] nltk_data はデータがあるフォルダーですが、モジュールもそのフォルダーにあるとは限りません (データはnltk_dataからダウンロードされます)

NLTK には、以下に示すように、多数のコーパスとトレーニング済みモデルのサポートが組み込まれています。これらを NLTK 内で使用するには、NLTK コーパスダウンローダー >>> nltk.download() を使用することをお勧めします。

python - Pythonでnltkを理解する

2 に答える 2

Related

Reference