python - Python 最も頻繁に使用されるコードを見つける

Question

ファイルを読んで、最も頻繁に使用される単語を見つけたいです。以下はコードです。ファイルを読んでいると思いますが、間違いを犯しています。任意の提案をいただければ幸いです。

txt_file = open('result.txt', 'r')

for line in txt_file:
    for word in line.strip().split():
        word = word.strip(punctuation).lower()

    all_words = nltk.FreqDist(word for word in word.words())
    top_words = set(all_words.keys()[:300])
    print top_words

入力結果.txtファイル

Musik to shiyuki miyama opa samba japan obi Musik Musik Musik 
Antiques    antique 1900 s sewing pattern pictorial review size Musik 36 bust 1910 s ladies waist bust

score 1 · Accepted Answer

from collections import Counter
txt_file = open('result.txt', 'r')
words = [word for line in txt_file for word in line.strip().split()]
print Counter(words).most_common(1)

1inの代わりにmost_common、任意の数を指定すると、その数の最も頻繁に使用されるデータが表示されます。例えば

print Counter(words).most_common(1)

結果は

[('Musik', 5)]

一方

print Counter(words).most_common(5)

与える

[('Musik', 5), ('bust', 2), ('s', 2), ('antique', 1), ('ladies', 1)]

数値は実際にはオプションのパラメーターです。これを省略すると、すべての単語の頻度が降順に表示されます。

score 1 · Accepted Answer

あなたのエラーが何であるか、またNLTKでそれを行う方法はわかりませんが、行をループするというあなたのアプローチは、単純なpython辞書を使用してカウントを追跡するように単語を適応させることができます:

txt_file = open("filename", "r")
txt_file.readLines()

wordFreq = {}
for line in txt_file:
    for word in line.strip().split():
        word = word.strip(punctuation).lower()
        # If word is already in dict, increase count
        if word in wordFreq:
            wordFreq[word] += 1
        else:    #Otherwise, add word to dict and initialize count to 1
            wordFreq[word] = 1

結果をクエリするには、目的の単語を dict にキーとして指定するだけwordFreq['Musik']です。

python - Python 最も頻繁に使用されるコードを見つける

2 に答える 2

Related

Reference