python - nltk.wordnet.synsets を使用した Python IF ステートメント

Question

import nltk
from nltk import *
from nltk.corpus import wordnet as wn

output=[]
wordlist=[]

entries = nltk.corpus.cmudict.entries()

for entry in entries[:200]: #create a list of words, without the pronounciation since.pos_tag only works with a list
    wordlist.append(entry[0])

for word in nltk.pos_tag(wordlist): #create a list of nouns
    if(word[1]=='NN'):
        output.append(word[0])

for word in output:
    x = wn.synsets(word) #remove all words which does not have synsets (this is the problem)
    if len(x)<1:
        output.remove(word)

for word in output[:200]:
    print (word," ",len(wn.synsets(word)))

シンセットのないすべての単語を削除しようとしていますが、何らかの理由で機能しません。プログラムを実行すると、単語が len(wn.synsets(word)) = 0 であると言われていても、リストから削除されていないことがわかりました。誰かが何が悪かったのか教えてもらえますか?

score 5 · Accepted Answer

リストを繰り返し処理し、同時に現在のアイテムを削除することはできません。問題を示すおもちゃの例を次に示します。

In [73]: output = range(10)

In [74]: for item in output:
   ....:     output.remove(item)

のすべてのアイテムoutputが削除されることを期待するかもしれません。しかし、代わりにそれらの半分がまだ残っています:

In [75]: output
Out[75]: [1, 3, 5, 7, 9]

ループと削除を同時に実行できない理由:

Python が内部カウンターを使用して、現在のアイテムがfor-loop.

カウンターが 0 になると (最初のループ)、Python が実行されます。

output.remove(item)

罰金。の項目が 1 つ少なくなりましたoutput。しかし、Python はカウンターを 1 にインクリメントします。つまり、word の次の値はで、元のリストoutput[1]の3 番目の項目です。

0  <-- first item removed
1  <-- the new output[0] ** THIS ONE GETS SKIPPED **
2  <-- the new output[1] -- gets removed on the next iteration

（回避策）ソリューション：

代わりに、のコピーを反復処理するかoutput、新しいリストを作成してください。この場合、新しいリストを作成する方が効率的だと思います。

new_output = []
for word in output:
    x = wn.synsets(word) 
    if len(x)>=1:
        new_output.append(word)

python - nltk.wordnet.synsets を使用した Python IF ステートメント

1 に答える 1

Related

Reference