python - python2.6-辞書のリスト内の重複を効率的に削除してカウントする

Question

私は効率的に変更しようとしています：

[{'text': 'hallo world', 'num': 1}, 
 {'text': 'hallo world', 'num': 2}, 
 {'text': 'hallo world', 'num': 1}, 
 {'text': 'haltlo world', 'num': 1}, 
 {'text': 'hallo world', 'num': 1}, 
 {'text': 'hallo world', 'num': 1}, 
 {'text': 'hallo world', 'num': 1}]

重複のない辞書のリストと重複の数に：

[{'text': 'hallo world', 'num': 2, 'count':1}, 
 {'text': 'hallo world', 'num': 1, 'count':5}, 
 {'text': 'haltlo world', 'num': 1, 'count':1}]

これまでのところ、重複を見つけるために次のものがあります。

result = [dict(tupleized) for tupleized in set(tuple(item.items()) for item in li)]

そしてそれは戻ります：

[{'text': 'hallo world', 'num': 2}, 
 {'text': 'hallo world', 'num': 1}, 
 {'text': 'haltlo world', 'num': 1}]

ありがとう！

score 6 · Accepted Answer

私は私のお気に入りの1つを使用しますitertools：

from itertools import groupby

def canonicalize_dict(x):
    "Return a (key, value) list sorted by the hash of the key"
    return sorted(x.items(), key=lambda x: hash(x[0]))

def unique_and_count(lst):
    "Return a list of unique dicts with a 'count' key added"
    grouper = groupby(sorted(map(canonicalize_dict, lst)))
    return [dict(k + [("count", len(list(g)))]) for k, g in grouper]

a = [{'text': 'hallo world', 'num': 1},  
     #....
     {'text': 'hallo world', 'num': 1}]

print unique_and_count(a)

出力

[{'count': 5, 'text': 'hallo world', 'num': 1}, 
{'count': 1, 'text': 'hallo world', 'num': 2}, 
{'count': 1, 'text': 'haltlo world', 'num': 1}]

gnibblerが指摘してd1.items()いるように、キーが同じであっても、d2.items()キーの順序が異なる場合があるため、この問題に対処するための関数を導入しました。canonical_dict

score 6 · Accepted Answer

注：これは現在使用frozensetされています。つまり、ディクショナリ内のアイテムはハッシュ可能である必要があります。

>>> from collections import defaultdict
>>> from itertools import chain
>>> data = [{'text': 'hallo world', 'num': 1}, {'text': 'hallo world', 'num': 2},  {'text': 'hallo world', 'num': 1}, {'text': 'haltlo world', 'num': 1}, {'text': 'hallo world', 'num': 1}, {'text': 'hallo world', 'num': 1}, {'text': 'hallo world', 'num': 1}]
>>> c = defaultdict(int)
>>> for d in data:
        c[frozenset(d.iteritems())] += 1


>>> [dict(chain(k, (('count', count),))) for k, count in c.iteritems()]
[{'count': 1, 'text': 'haltlo world', 'num': 1}, {'count': 1, 'text': 'hallo world', 'num': 2}, {'count': 5, 'text': 'hallo world', 'num': 1}]

score 1 · Accepted Answer

ビルトインを使用せずに簡単なソリューションをしたい、

>>> d = [{'text': 'hallo world', 'num': 1}, 
...  {'text': 'hallo world', 'num': 2}, 
...  {'text': 'hallo world', 'num': 1}, 
...  {'text': 'haltlo world', 'num': 1}, 
...  {'text': 'hallo world', 'num': 1}, 
...  {'text': 'hallo world', 'num': 1}, 
...  {'text': 'hallo world', 'num': 1}]
>>> 
>>> def unique_counter(filesets):
...      for i in filesets:
...          i['count'] = sum([1 for j in filesets if j['num'] == i['num']])
...      return {k['num']:k for k in filesets}.values()
... 
>>> unique_counter(d)
[{'count': 6, 'text': 'hallo world', 'num': 1}, {'count': 1, 'text': 'hallo world', 'num': 2}]

python - python2.6-辞書のリスト内の重複を効率的に削除してカウントする

3 に答える 3

Related

Reference