python - 次のpythonコードのエラーは何ですか

Question

ストップワードを削除したい。これが私のコードです

import nltk
from nltk.corpus import stopwords
import string

u="The apple is the pomaceous fruit of the apple tree, species Malus domestica in the rose family (Rosaceae). It is one of the most widely cultivated tree fruits, and the most widely known of the many members of genus Malus that are used by humans."

v="An orange is a fruit of the orangle tree. it is the most cultivated tree fruits"

u=u.lower()
v=v.lower()

u_list=nltk.word_tokenize(u)
v_list=nltk.word_tokenize(v)

for word in u_list:
    if word in stopwords.words('english'):
        u_list.remove(word)
for word in v_list:
    if word in stopwords.words('english'):
        v_list.remove(word)

print u_list
print "\n\n\n\n"
print v_list

ただし、一部のストップワードのみが削除されます。これで私を助けてください

score 1 · Accepted Answer

あなたがしていることの問題は、 list.remove(x) がすべての x ではなく、最初に出現したのみを削除することですx。すべてのインスタンスを削除するには、を使用できますが、filter次のようなものを選択します。

u_list = [word for word in u_list if word not in stopwords.words('english')]

score 0 · Accepted Answer

分割された単語のリストとストップワードのリストを a に変換して単語を削除し、setを計算しdifferenceます。

u_list = list(set(u_list).difference(set(stopwords.words('english'))))

これにより、ストップワードの出現が適切に削除されます。

python - 次のpythonコードのエラーは何ですか

3 に答える 3

Related

Reference