python - スペーシーオブジェクトを使用する joblib

翻译自：https://stackoverflow.com/questions/41178839 2016-12-16T06:54:54.740

461 次

私は中規模のテキストデータセットを使用しています.pandasシリーズ(オブジェクト型)としてロードした約1GBの単一のテキスト列です。といいtextDataます。

テキスト行ごとにドキュメントを作成し、トークン化したいと考えています。しかし、カスタムトークナイザーを使用したいと考えています。

from joblib import Parallel, delayed
from spacy.en import English


nlp = English()
docs = nlp.pipe([text for text in textData], batch_size=batchSize, n_threads=n_threads)

# This runs without any errors, but results is empty
results1 = Parallel(n_jobs=-1,)(delayed(clean_tokens)(doc) for doc in docs)
# This runs, and returns expected result
results2 = [clean_tokens(doc) for doc in docs]

def clean_tokens(doc):
    # clean tokens and POS tags
    exclusions = [token.i for token in doc if token.dep in [punct, det, agent, prep, aux, auxpass, cc, expl, quantmod]]
    tokens = [token.lemma_ for token in doc if token.i not in exclusions]        
    return tokens

スクリプトを使用して main() を呼び出し、main() 内で上記の関数を実行しています。

これが機能しない理由はありますか？酸洗の問題がある場合 - それは発生しません。

これを機能させる方法はありますか？

python - スペーシー オブジェクトを使用する joblib

0 に答える 0

Related

Reference

python - スペーシーオブジェクトを使用する joblib