python - Finding path for corpus in NLTK

Question

I am using the Natural Language Toolkit for python to write a program. In it I am trying to load a corpus of my own files. To do that I am using code to the following effect:

from nltk.corpus import PlaintextCorpusReader
corpus_root=(insert filepath here)
wordlists=PlaintextCorpusReader(corpus_root, '.*')

Let's say my file is called reader.py and my corpus of files is located in a directory called 'corpus' in the same directory as reader.py. I would like to know a way to generalize finding the filepath above, so that my code could find the path for the 'corpus' directory for any location for anyone using the code. I have tried these posts, but they only allow me to get absolute file paths: Find current directory and file's directory

Any help would be greatly appreciated!

score 1 · Accepted Answer

私が理解していることから

ファイルreader.pyとcorpusディレクトリは常に同じディレクトリにあります
ディレクトリ構造のどこに配置してもcorpus参照できる方法を探しているreader.py

その場合、あなたが言及した質問はあなたが必要としているもののようです。それを行う別の方法は、この他の回答にあります。その 2 番目のオプションを使用すると、コードは次のようになります。

from nltk.corpus import PlaintextCorpusReader
import os.path
import sys

basepath = os.path.dirname(__file__)
corpus_root= os.path.abspath(os.path.join(basepath, "corpus"))
wordlists=PlaintextCorpusReader(corpus_root, '.*')

絶対パスが作成される間、basepath = os.path.dirname(__file__)上記のビットで取得した情報に基づいて作成されることに注意してください。これにより、reader.pyの現在のディレクトリが生成されます。いくつかの公式ドキュメントについては、ドキュメントを参照してください。

python - Finding path for corpus in NLTK

2 に答える 2

Related

Reference