python - 特定の単語を取り、各フレーズ/単語の頻度を出力しますか?

Question

バンドのリストと、アルバムとその制作年を含むファイルがあります。このファイルを調べてバンドのさまざまな名前を見つけ、それらの各バンドがこのファイルに何回出現するかをカウントする関数を作成する必要があります。

ファイルの外観は次のようになります。

Beatles - Revolver (1966)
Nirvana - Nevermind (1991)
Beatles - Sgt Pepper's Lonely Hearts Club Band (1967)
U2 - The Joshua Tree (1987)
Beatles - The Beatles (1968)
Beatles - Abbey Road (1969)
Guns N' Roses - Appetite For Destruction (1987)
Radiohead - Ok Computer (1997)
Led Zeppelin - Led Zeppelin 4 (1971)
U2 - Achtung Baby (1991)
Pink Floyd - Dark Side Of The Moon (1973)
Michael Jackson -Thriller (1982)
Rolling Stones - Exile On Main Street (1972)
Clash - London Calling (1979)
U2 - All That You Can't Leave Behind (2000)
Weezer - Pinkerton (1996)
Radiohead - The Bends (1995)
Smashing Pumpkins - Mellon Collie And The Infinite Sadness (1995)
.
.
.

出力は周波数の降順である必要があり、次のようになります。

band1: number1
band2: number2
band3: number3

これが私がこれまでに持っているコードです:

def read_albums(filename) :

    file = open("albums.txt", "r")
    bands = {}
    for line in file :
        words = line.split()
        for word in words:
            if word in '-' :
                del(words[words.index(word):])
        string1 = ""
        for i in words :
            list1 = []

            string1 = string1 + i + " "
            list1.append(string1)
        for k in list1 :
            if (k in bands) :
                bands[k] = bands[k] +1
            else :
                bands[k] = 1


    for word in bands :
        frequency = bands[word]
        print(word + ":", len(bands))

これを行う簡単な方法があると思いますが、よくわかりません。また、頻度で辞書をソートする方法がわかりません。リストに変換する必要がありますか?

score 2 · Accepted Answer

あなたは正しいです、より簡単な方法がありますCounter：

from collections import Counter

with open('bandfile.txt') as f:
   counts = Counter(line.split('-')[0].strip() for line in f if line)

for band, count in counts.most_common():
    print("{0}:{1}".format(band, count))

これは正確に何をしているのか: line.split('-')[0].strip() for line in f if line?

この行は、次のループの長い形式です。

temp_list = []
for line in f:
    if line: # this makes sure to skip blank lines
      bits = line.split('-')
      temp_list.add(bits[0].strip())

counts = Counter(temp_list)

ただし、上記のループとは異なり、中間リストは作成されません。代わりに、ジェネレーター式を作成します。これは、物事をステップスルーするためのよりメモリ効率の良い方法です。への引数として使用されますCounter。

score 1 · Accepted Answer

簡潔さを求めている場合は、「defaultdict」と「sorted」を使用してください

from collections import defaultdict
bands = defaultdict(int)
with open('tmp.txt') as f:
   for line in f.xreadlines():
       band = line.split(' - ')[0]
       bands[band] += 1
for band, count in sorted(bands.items(), key=lambda t: t[1], reverse=True):
    print '%s: %d' % (band, count)

python - 特定の単語を取り、各フレーズ/単語の頻度を出力しますか?

3 に答える 3

Related

Reference