python - Pythonの単語頻度が機能しない

Question

Pythonを使用してテキストファイル内の単語の頻度を数えようとしています。

次のコードを使用しています。

openfile=open("total data", "r")

linecount=0
for line in openfile:
    if line.strip():
        linecount+=1

count={}

while linecount>0:
    line=openfile.readline().split()
    for word in line:
        if word in count:
            count[word]+=1
        else:
            count[word]=1
    linecount-=1

print count

しかし、私は空の辞書を取得します。「印刷カウント」は出力として {} を与える

私も使ってみました：

from collections import defaultdict
.
.
count=defaultdict(int)
.
.
     if word in count:
          count[word]=count.get(word,0)+1

しかし、私は再び空の辞書を取得しています。私は何が間違っているのか理解できません。誰か指摘してくれませんか？

score 9 · Accepted Answer

このループfor line in openfile:は、ファイルポインターをファイルの末尾に移動します。そのため、データを再度読み取りたい場合は、ポインター( openfile.seek(0)) をファイルの先頭に移動するか、ファイルを再度開きます。

単語の頻度をより適切に取得するには、次のように使用しますCollections.Counter。

from collections import Counter
with open("total data", "r") as openfile:
   c = Counter()
   for line in openfile:
      words = line.split()
      c.update(words)

score 1 · Accepted Answer

openfile.seek(0)初期化直後に追加しますcount。これにより、読み取りポインターがファイルの先頭に配置されます

score 1 · Accepted Answer

これは、ファイル内の単語の頻度をカウントするより直接的な方法です。

from collections import Counter

def count_words_in_file(file_path):
    with open(file_path) as f:
        return Counter(f.read().split())

例：

>>> count_words_in_file('C:/Python27/README.txt').most_common(10)
[('the', 395), ('to', 202), ('and', 129), ('is', 120), ('you', 111), ('a', 107), ('of', 102), ('in', 90), ('for', 84), ('Python', 69)]

python - Pythonの単語頻度が機能しない

3 に答える 3

Related

Reference