python - リストの最も一般的な要素を見つける方法は？

Question

次のリストが与えられた

['Jellicle', 'Cats', 'are', 'black', 'and', 'white,', 'Jellicle', 'Cats', 
 'are', 'rather', 'small;', 'Jellicle', 'Cats', 'are', 'merry', 'and', 
 'bright,', 'And', 'pleasant', 'to', 'hear', 'when', 'they', 'caterwaul.', 
 'Jellicle', 'Cats', 'have', 'cheerful', 'faces,', 'Jellicle', 'Cats', 
 'have', 'bright', 'black', 'eyes;', 'They', 'like', 'to', 'practise', 
 'their', 'airs', 'and', 'graces', 'And', 'wait', 'for', 'the', 'Jellicle', 
 'Moon', 'to', 'rise.', '']

各単語が表示される回数を数えて、上位3つを表示しようとしています。

ただし、最初の文字が大文字になっている上位3つを探しているだけで、最初の文字が大文字になっていないすべての単語を無視します。

これよりも良い方法があると確信していますが、私の考えは次のことを行うことでした。

リストの最初の単語をuniquewordsという別のリストに入れます
最初の単語とその重複するすべての単語を元のリストから削除します
新しい最初の単語を一意の単語に追加します
最初の単語とその重複するすべての単語を元のリストから削除します。
等...
元のリストが空になるまで...
一意の単語の各単語が元のリストに表示される回数を数える
トップ3を見つけて印刷する

score 85 · Accepted Answer

Python 2.7以降には、次のような役立つCounterというクラスがあります。

from collections import Counter
words_to_count = (word for word in word_list if word[:1].isupper())
c = Counter(words_to_count)
print c.most_common(3)

結果：

[('Jellicle', 6), ('Cats', 5), ('And', 2)]

私はプログラミングにまったく慣れていないので、最も必要最低限の方法でそれを試してみてください。

代わりに、キーが単語で値がその単語のカウントである辞書を使用してこれを行うことができます。最初に単語を繰り返して、単語が存在しない場合は辞書に追加します。存在しない場合は、単語の数を増やします。次に、上位3つを見つけるには、単純なO(n*log(n))並べ替えアルゴリズムを使用して結果から最初の3つの要素を取得するかO(n)、上位3つの要素のみを記憶してリストをスキャンするアルゴリズムを使用できます。

初心者にとって重要なことは、その目的のために設計された組み込みのクラスを使用することで、多くの作業を節約したり、パフォーマンスを向上させたりできることです。標準ライブラリとそれが提供する機能に精通していることは良いことです。

score 23 · Accepted Answer

以前のバージョンの Python を使用している場合、または独自の単語カウンターをロールする非常に正当な理由がある場合 (それを聞きたいです!) dict、.

Python 2.6.1 (r261:67515, Feb 11 2010, 00:51:29) 
[GCC 4.2.1 (Apple Inc. build 5646)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> word_list = ['Jellicle', 'Cats', 'are', 'black', 'and', 'white,', 'Jellicle', 'Cats', 'are', 'rather', 'small;', 'Jellicle', 'Cats', 'are', 'merry', 'and', 'bright,', 'And', 'pleasant', 'to', 'hear', 'when', 'they', 'caterwaul.', 'Jellicle', 'Cats', 'have', 'cheerful', 'faces,', 'Jellicle', 'Cats', 'have', 'bright', 'black', 'eyes;', 'They', 'like', 'to', 'practise', 'their', 'airs', 'and', 'graces', 'And', 'wait', 'for', 'the', 'Jellicle', 'Moon', 'to', 'rise.', '']
>>> word_counter = {}
>>> for word in word_list:
...     if word in word_counter:
...         word_counter[word] += 1
...     else:
...         word_counter[word] = 1
... 
>>> popular_words = sorted(word_counter, key = word_counter.get, reverse = True)
>>> 
>>> top_3 = popular_words[:3]
>>> 
>>> top_3
['Jellicle', 'Cats', 'and']

重要なヒント: 対話型の Python インタープリターは、このようなアルゴリズムで遊びたいときはいつでもあなたの味方です。入力するだけで、途中で要素を調べながら、それが進むのを見てください。

score 20 · Accepted Answer

最も一般的な単語を含むリストを返すには:

from collections import Counter
words=["i", "love", "you", "i", "you", "a", "are", "you", "you", "fine", "green"]
most_common_words= [word for word, word_count in Counter(words).most_common(3)]
print most_common_words

これは次のように表示されます：

['you', 'i', 'a']

" " の 3 は、most_common(3)印刷するアイテムの数を指定します。 Counter(words).most_common()タプルのリストを返します。各タプルは最初のメンバーとして単語を持ち、2 番目のメンバーとして頻度を持ちます。タプルは単語の頻度順に並べられます。

`most_common = [item for item in Counter(words).most_common()]
print(str(most_common))
[('you', 4), ('i', 2), ('a', 1), ('are', 1), ('green', 1), ('love',1), ('fine', 1)]`

"the word for word, word_counter in" は、タプルの最初のメンバーのみを抽出します。

score 6 · Accepted Answer

nltkは多くの言語処理に便利です。頻度分布のメソッドが組み込まれています。次のようなものです。

import nltk
fdist = nltk.FreqDist(your_list) # creates a frequency distribution from a list
most_common = fdist.max()    # returns a single element
top_three = fdist.keys()[:3] # returns a list

score 6 · Accepted Answer

追加のモジュールを必要としない、これに対する単純な 2 行のソリューションは、次のコードです。

lst = ['Jellicle', 'Cats', 'are', 'black', 'and','white,',
       'Jellicle', 'Cats','are', 'rather', 'small;', 'Jellicle', 
       'Cats', 'are', 'merry', 'and','bright,', 'And', 'pleasant',    
       'to','hear', 'when', 'they', 'caterwaul.','Jellicle', 
       'Cats', 'have','cheerful', 'faces,', 'Jellicle',
       'Cats','have', 'bright', 'black','eyes;', 'They', 'like',
       'to', 'practise','their', 'airs', 'and', 'graces', 'And', 
       'wait', 'for', 'the', 'Jellicle','Moon', 'to', 'rise.', '']

lst_sorted=sorted([ss for ss in set(lst) if len(ss)>0 and ss.istitle()], 
                   key=lst.count, 
                   reverse=True)
print lst_sorted[0:3]

出力：

['Jellicle', 'Cats', 'And']

角括弧内の用語は、リスト内のすべての一意の文字列を返します。これらの文字列は空ではなく、大文字で始まります。次に、sorted()関数は、(キーを使用して) リストに表示される頻度でそれらlst.countを逆順に並べ替えます。

score 2 · Accepted Answer

これを行う簡単な方法は次のとおりです（リストが「l」にあると仮定します）：

>>> counter = {}
>>> for i in l: counter[i] = counter.get(i, 0) + 1
>>> sorted([ (freq,word) for word, freq in counter.items() ], reverse=True)[:3]
[(6, 'Jellicle'), (5, 'Cats'), (3, 'to')]

完全なサンプル:

>>> l = ['Jellicle', 'Cats', 'are', 'black', 'and', 'white,', 'Jellicle', 'Cats', 'are', 'rather', 'small;', 'Jellicle', 'Cats', 'are', 'merry', 'and', 'bright,', 'And', 'pleasant', 'to', 'hear', 'when', 'they', 'caterwaul.', 'Jellicle', 'Cats', 'have', 'cheerful', 'faces,', 'Jellicle', 'Cats', 'have', 'bright', 'black', 'eyes;', 'They', 'like', 'to', 'practise', 'their', 'airs', 'and', 'graces', 'And', 'wait', 'for', 'the', 'Jellicle', 'Moon', 'to', 'rise.', '']
>>> counter = {}
>>> for i in l: counter[i] = counter.get(i, 0) + 1
... 
>>> counter
{'and': 3, '': 1, 'merry': 1, 'rise.': 1, 'small;': 1, 'Moon': 1, 'cheerful': 1, 'bright': 1, 'Cats': 5, 'are': 3, 'have': 2, 'bright,': 1, 'for': 1, 'their': 1, 'rather': 1, 'when': 1, 'to': 3, 'airs': 1, 'black': 2, 'They': 1, 'practise': 1, 'caterwaul.': 1, 'pleasant': 1, 'hear': 1, 'they': 1, 'white,': 1, 'wait': 1, 'And': 2, 'like': 1, 'Jellicle': 6, 'eyes;': 1, 'the': 1, 'faces,': 1, 'graces': 1}
>>> sorted([ (freq,word) for word, freq in counter.items() ], reverse=True)[:3]
[(6, 'Jellicle'), (5, 'Cats'), (3, 'to')]

シンプルとは、ほぼすべてのバージョンの Python で動作することを意味します。

このサンプルで使用されている関数の一部を理解していない場合は、インタープリターでいつでも実行できます (上記のコードを貼り付けた後)。

>>> help(counter.get)
>>> help(sorted)

score 1 · Accepted Answer

Countを使用している場合、または独自のCountスタイルの dict を作成していて、アイテムの名前とその数を表示したい場合は、次のように辞書を反復処理できます。

top_10_words = Counter(my_long_list_of_words)
# Iterate around the dictionary
for word in top_10_words:
        # print the word
        print word[0]
        # print the count
        print word[1]

またはテンプレートでこれを繰り返すには：

{% for word in top_10_words %}
        <p>Word: {{ word.0 }}</p>
        <p>Count: {{ word.1 }}</p>
{% endfor %}

これが誰かを助けることを願っています

python - リストの最も一般的な要素を見つける方法は？

11 に答える 11

Related

Reference