python - 各列で最も多く発生する要素を見つける最も簡単な方法

Question

私が持っているとしましょう

data =
[[a, a, c],
 [b, c, c],
 [c, b, b],
 [b, a, c]]

各列で最も多く出現する要素を含むリストを取得したいのですが、result = [b, a, c]これを行う最も簡単な方法は何ですか？

Python2.6.6を使用しています

score 4 · Accepted Answer

統計では、必要なものはモードと呼ばれます。scipy ライブラリ ( http://www.scipy.org/ ) には、mode関数 in がありscipy.statsます。

In [32]: import numpy as np

In [33]: from scipy.stats import mode

In [34]: data = np.random.randint(1,6, size=(6,8))

In [35]: data
Out[35]: 
array([[2, 1, 5, 5, 3, 3, 1, 4],
       [5, 3, 2, 2, 5, 2, 5, 3],
       [2, 2, 5, 3, 3, 2, 1, 1],
       [2, 4, 1, 5, 4, 4, 4, 5],
       [4, 4, 5, 5, 2, 4, 4, 4],
       [2, 4, 1, 1, 3, 3, 1, 3]])

In [36]: val, count = mode(data, axis=0)

In [37]: val
Out[37]: array([[ 2.,  4.,  5.,  5.,  3.,  2.,  1.,  3.]])

In [38]: count
Out[38]: array([[ 4.,  3.,  3.,  3.,  3.,  2.,  3.,  2.]])

score 3 · Accepted Answer

リスト内包表記とを使用しますcollections.Counter()。

from collections import Counter

[Counter(col).most_common(1)[0][0] for col in zip(*data)]

zip(*data)リストのリストを再配置して、代わりに列のリストにします。Counter()オブジェクトは、入力シーケンスに何かが出現する頻度をカウントし.most_common(1)、最も人気のある要素 (さらにそのカウント) を提供します。

入力が単一の文字列である場合、次のようになります。

>>> [Counter(col).most_common(1)[0][0] for col in zip(*data)]
['b', 'a', 'c']

score 3 · Accepted Answer

データはハッシュ可能ですか? もしそうなら、 acollections.Counterが役に立ちます：

[Counter(col).most_common(1)[0][0] for col in zip(*data)]

zip(*data)一度に1列を生成する入力データを転置するため、機能します。次に、カウンターは要素をカウントし、カウントを値としてディクショナリに格納します。 Countersカウントが最も高い「N」個のアイテムのリストを返すメソッドもmost_commonあります (カウントが最も多いものから最も少ないものへと並べ替えられます)。したがって、most_common によって返されるリストの最初の項目の最初の要素を取得する必要があり[0][0]ます。

例えば

>>> a,b,c = 'abc'
>>> from collections import Counter
>>> data = [[a, a, c],
...  [b, c, c],
...  [c, b, b],
...  [b, a, c]]
>>> [Counter(col).most_common(1)[0][0] for col in zip(*data)]
['b', 'a', 'c']

score 0 · Accepted Answer

コレクションモジュールを使用しないソリューションは次のとおりです

def get_most_common(data):

    data = zip(*data)
    count_dict = {}
    common = []
    for col in data:
        for val in col:
            count_dict[val] = count_dict.get(val, 0) + 1
        max_count = max([count_dict[key] for key in count_dict])
        common.append(filter(lambda k: count_dict[k] == max_count, count_dict))

    return common

if __name__ == "__main__":

    data = [['a','a','b'],
            ['b','c','c'],
            ['a','b','b'],
            ['b','a','c']]

    print get_most_common(data)

python - 各列で最も多く発生する要素を見つける最も簡単な方法

4 に答える 4

Related

Reference