python - 複数のドキュメントで単語の頻度をカウントするには python

Question

辞書 'd' に複数のテキストファイルのアドレスのリストがあります。

'd:/individual-articles/9.txt', 'd:/individual-articles/11.txt', 'd:/individual-articles/12.txt',...

等々...

ここで、辞書内の各ファイルを読み取り、辞書全体で出現するすべての単語の出現単語のリストを保持する必要があります。

私の出力は次の形式である必要があります。

the-500

a-78

in-56

等々..

ここで、500 は、辞書内のすべてのファイルで "the" という単語が出現する回数です。

すべての単語に対してこれを行う必要があります。

私はpythonの初心者です..plz help!

以下のコードが機能しません。出力が表示されません!ロジックに間違いがあるはずです。修正してください!!

import collections
import itertools
import os
from glob import glob
from collections import Counter




folderpaths='d:/individual-articles'
counter=Counter()


filepaths = glob(os.path.join(folderpaths,'*.txt'))




folderpath='d:/individual-articles/'
# i am creating my dictionary here, can be ignored
d = collections.defaultdict(list)
with open('topics.txt') as f:
    for line in f:
       value, *keys = line.strip().split('~')
        for key in filter(None, keys):
            if key=='earn':
               d[key].append(folderpath+value+".txt")

   for key, value in d.items() :
        print(value)


word_count_dict={}

for file in d.values():
    with open(file,"r") as f:
        words = re.findall(r'\w+', f.read().lower())
        counter = counter + Counter(words)
        for word in words:
            word_count_dict[word].append(counter)              


for word, counts in word_count_dict.values():
    print(word, counts)

python - 複数のドキュメントで単語の頻度をカウントするには python

2 に答える 2

Related

Reference