python - リストの 1% 未満および 60% を超えるすべての要素を削除します

Question

この文字列のリストがある場合:

['fsuy3,fsddj4,fsdg3,hfdh6,gfdgd6,gfdf5',
'fsuy3,fsuy3,fdfs4,sdgsdj4,fhfh4,sds22,hhgj6,xfsd4a,asr3']

(大きなリスト)

文字列の 1% 未満および 60% 以上に出現するすべての単語を削除するにはどうすればよいですか?

score 1 · Accepted Answer

簡単な解決策

occurrences = dict()
for word in words:
  if word not in occurrences:
     occurrences[word] = 1
  else:
     occurrences[word] += 1

result = [word for word in words 0.01 <= occurrences[word] /len(words) <= 0.6]

score 0 · Accepted Answer

私はあなたがこれを望んでいると推測します：

    from collections import Counter,Set

# break up by ',' and remove duplicate words on each line
    st = [set(s.split(',')) for s in mylist]

# Count all the words
    count = Counter([word for line in st for word in line])

# Work out which words are allowed
    allowed = [s for s in count if 0.01 < counts[s]/len(mylist) < 0.60]

#For each row in the original list. If the word is allowed then keep it
    result = [[w for w in s.split(',') if w in allowed] for s in mylist]

    print result

python - リストの 1% 未満および 60% を超えるすべての要素を削除します

3 に答える 3

Related

Reference