よく使うことcollections.Counter
は、この回答グループで十分にカバーされていますが、それは最速の選択ではないかもしれません.
1 つの古い方法は次のとおりです。
>>> d={}
>>> for ext in ('.mp3','.mp3','.m4a','.mp3','.wav','.m4a'):
... d[ext]=d.setdefault(ext,0)+1
...
>>> d
{'.mp3': 3, '.wav': 1, '.m4a': 2}
それも最速ではありませんが、それよりも高速ですcollections.Counter
これらのメソッドのベンチマークがあり、defaultdict、try/except、または元のメソッドのいずれかが最速です。
ここでベンチマークを再現 (および拡張) しました。
import urllib2
import timeit
response = urllib2.urlopen('http://pastebin.com/raw.php?i=7p3uycAz')
hamlet = response.read().replace('\r\n','\n')
LETTERS = [w for w in hamlet]
WORDS = hamlet.split(' ')
fmt='{:>20}: {:7.4} seconds for {} loops'
n=100
print
t = timeit.Timer(stmt="""
counter = defaultdict(int)
for k in LETTERS:
counter[k] += 1
""",
setup="from collections import defaultdict; from __main__ import LETTERS")
print fmt.format("defaultdict letters",t.timeit(n),n)
t = timeit.Timer(stmt="""
counter = defaultdict(int)
for k in WORDS:
counter[k] += 1
""",
setup="from collections import defaultdict; from __main__ import WORDS")
print fmt.format("defaultdict words",t.timeit(n),n)
print
# setdefault
t = timeit.Timer(stmt="""
counter = {}
for k in LETTERS:
counter[k]=counter.setdefault(k, 0)+1
""",
setup="from __main__ import LETTERS")
print fmt.format("setdefault letters",t.timeit(n),n)
t = timeit.Timer(stmt="""
counter = {}
for k in WORDS:
counter[k]=counter.setdefault(k, 0)+1
""",
setup="from __main__ import WORDS")
print fmt.format("setdefault words",t.timeit(n),n)
print
# Counter
t = timeit.Timer(stmt="c = Counter(LETTERS)",
setup="from collections import Counter; from __main__ import LETTERS")
print fmt.format("Counter letters",t.timeit(n),n)
t = timeit.Timer(stmt="c = Counter(WORDS)",
setup="from collections import Counter; from __main__ import WORDS")
print fmt.format("Counter words",t.timeit(n),n)
print
# in
t = timeit.Timer(stmt="""
counter = {}
for k in LETTERS:
if k in counter: counter[k]+=1
else: counter[k]=1
""",
setup="from __main__ import LETTERS")
print fmt.format("'in' letters",t.timeit(n),n)
t = timeit.Timer(stmt="""
counter = {}
for k in WORDS:
if k in counter: counter[k]+=1
else: counter[k]=1
""",
setup="from __main__ import WORDS")
print fmt.format("'in' words",t.timeit(n),n)
print
# try
t = timeit.Timer(stmt="""
counter = {}
for k in LETTERS:
try:
counter[k]+=1
except KeyError:
counter[k]=1
""",
setup="from __main__ import LETTERS")
print fmt.format("try letters",t.timeit(n),n)
t = timeit.Timer(stmt="""
counter = {}
for k in WORDS:
try:
counter[k]+=1
except KeyError:
counter[k]=1 """,
setup="from __main__ import WORDS")
print fmt.format("try words",t.timeit(n),n)
print "\n{:,} letters and {:,} words".format(len(list(LETTERS)),len(list(WORDS)))
版画:
defaultdict letters: 3.001 seconds for 100 loops
defaultdict words: 0.8495 seconds for 100 loops
setdefault letters: 4.839 seconds for 100 loops
setdefault words: 0.946 seconds for 100 loops
Counter letters: 7.335 seconds for 100 loops
Counter words: 1.298 seconds for 100 loops
'in' letters: 4.013 seconds for 100 loops
'in' words: 0.7275 seconds for 100 loops
try letters: 3.389 seconds for 100 loops
try words: 1.571 seconds for 100 loops
175,176 letters and 26,630 words
個人的にはtry
except
、これが最速の方法の 1 つであることに驚きました。誰かわかったね...