georg、NPE、Scott、およびHavokの回答に基づく追加メモ。
2つ以上の辞書のコレクションに対してこのアクションを実行しようとしていて、それぞれにかかった時間を知りたいと思っていました。これを任意の数の辞書で実行したかったため、いくつかの回答を少し変更する必要がありました。誰かがより良い提案をしている場合は、自由に編集してください。
これが私のテスト方法です。最近更新して、はるかに大きな辞書を使用したテストを含め、Havok と Scott の新しいメソッドを再度含めました。
まず、次のデータを使用しました。
import random
x = {'xy1': 1, 'xy2': 2, 'xyz': 3, 'only_x': 100}
y = {'xy1': 10, 'xy2': 20, 'xyz': 30, 'only_y': 200}
z = {'xyz': 300, 'only_z': 300}
small_tests = [x, y, z]
# 200,000 random 8 letter keys
keys = [''.join(random.choice("abcdefghijklmnopqrstuvwxyz") for _ in range(8)) for _ in range(200000)]
a, b, c = {}, {}, {}
# 50/50 chance of a value being assigned to each dictionary, some keys will be missed but meh
for key in keys:
if random.getrandbits(1):
a[key] = random.randint(0, 1000)
if random.getrandbits(1):
b[key] = random.randint(0, 1000)
if random.getrandbits(1):
c[key] = random.randint(0, 1000)
large_tests = [a, b, c]
print("a:", len(a), "b:", len(b), "c:", len(c))
#: a: 100069 b: 100385 c: 99989
次に、各メソッド:
from collections import defaultdict, Counter
from functools import reduce
def georg_method(tests):
return {k: sum(t.get(k, 0) for t in tests) for k in set.union(*[set(t) for t in tests])}
def georg_method_nosum(tests):
# If you know you will have exactly 3 dicts
return {k: tests[0].get(k, 0) + tests[1].get(k, 0) + tests[2].get(k, 0) for k in set.union(*[set(t) for t in tests])}
def npe_method(tests):
ret = defaultdict(int)
for d in tests:
for k, v in d.items():
ret[k] += v
return dict(ret)
# Note: There is a bug with scott's method. See below for details.
# Scott included a similar version using counters that is fixed
# See the scott_update_method below
def scott_method(tests):
return dict(sum((Counter(t) for t in tests), Counter()))
def scott_method_nosum(tests):
# If you know you will have exactly 3 dicts
return dict(Counter(tests[0]) + Counter(tests[1]) + Counter(tests[2]))
def scott_update_method(tests):
ret = Counter()
for test in tests:
ret.update(test)
return dict(ret)
def scott_update_method_static(tests):
# If you know you will have exactly 3 dicts
xx = Counter(tests[0])
yy = Counter(tests[1])
zz = Counter(tests[2])
xx.update(yy)
xx.update(zz)
return dict(xx)
def havok_method(tests):
def reducer(accumulator, element):
for key, value in element.items():
accumulator[key] = accumulator.get(key, 0) + value
return accumulator
return reduce(reducer, tests, {})
methods = {
"georg_method": georg_method, "georg_method_nosum": georg_method_nosum,
"npe_method": npe_method,
"scott_method": scott_method, "scott_method_nosum": scott_method_nosum,
"scott_update_method": scott_update_method, "scott_update_method_static": scott_update_method_static,
"havok_method": havok_method
}
また、リスト間にある違いを見つける簡単な関数も作成しました。Counter()
残念ながら、私が Scott の方法に問題を発見したのはそのときです。つまり、合計が 0 になる辞書がある場合、追加時の の動作が原因で、辞書はまったく含まれません。
テスト設定:
- MacBook Pro (15 インチ、Late 2016)、2.9 GHz Intel Core i7、16 GB 2133 MHz LPDDR3 RAM、macOS Mojave バージョン 10.14.5 を実行
- IPython 6.1.0 経由の Python 3.6.5
最後に、結果:
結果: 小さなテスト
for name, method in methods.items():
print("Method:", name)
%timeit -n10000 method(small_tests)
#: Method: georg_method
#: 7.81 µs ± 321 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
#: Method: georg_method_nosum
#: 4.6 µs ± 48.8 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
#: Method: npe_method
#: 3.2 µs ± 24.7 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
#: Method: scott_method
#: 24.9 µs ± 326 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
#: Method: scott_method_nosum
#: 18.9 µs ± 64.8 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
#: Method: scott_update_method
#: 9.1 µs ± 90.7 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
#: Method: scott_update_method_static
#: 14.4 µs ± 122 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
#: Method: havok_method
#: 3.09 µs ± 47.9 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
結果: 大規模なテスト
当然、多くのループの近くで実行できませんでした
for name, method in methods.items():
print("Method:", name)
%timeit -n10 method(large_tests)
#: Method: georg_method
#: 347 ms ± 20 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
#: Method: georg_method_nosum
#: 280 ms ± 4.97 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
#: Method: npe_method
#: 119 ms ± 11 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
#: Method: scott_method
#: 324 ms ± 16.8 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
#: Method: scott_method_nosum
#: 289 ms ± 14.3 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
#: Method: scott_update_method
#: 123 ms ± 1.94 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
#: Method: scott_update_method_static
#: 136 ms ± 3.19 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
#: Method: havok_method
#: 103 ms ± 1.31 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
結論
╔═══════════════════════════╦═══════╦═════════════════════════════╗
║ ║ ║ Best of Time Per Loop ║
║ Algorithm ║ By ╠══════════════╦══════════════╣
║ ║ ║ small_tests ║ large_tests ║
╠═══════════════════════════╬═══════╬══════════════╬══════════════╣
║ functools reduce ║ Havok ║ 3.1 µs ║ 103,000 µs ║
║ defaultdict sum ║ NPE ║ 3.2 µs ║ 119,000 µs ║
║ Counter().update loop ║ Scott ║ 9.1 µs ║ 123,000 µs ║
║ Counter().update static ║ Scott ║ 14.4 µs ║ 136,000 µs ║
║ set unions without sum() ║ georg ║ 4.6 µs ║ 280,000 µs ║
║ set unions with sum() ║ georg ║ 7.8 µs ║ 347,000 µs ║
║ Counter() without sum() ║ Scott ║ 18.9 µs ║ 289,000 µs ║
║ Counter() with sum() ║ Scott ║ 24.9 µs ║ 324,000 µs ║
╚═══════════════════════════╩═══════╩══════════════╩══════════════╝
重要。YMMV。