python - Python:部分文字列の重複出現をカウントする方法

Question

「aa」のような文字列が「aaa」(または「aaaa」) に出現する回数を数えたかったのです。

最も明白なコードは、間違った (または少なくとも直感的ではない) 答えを示しています。

'aaa'.count('aa')
1 # should be 2
'aaaa'.count('aa')
2 # should be 3

これを修正する簡単な方法はありますか？

score 10 · Accepted Answer

str.count()ドキュメントから：

範囲 [start, end] 内の部分文字列 subの重複しない出現回数を返します。オプションの引数 start と end は、スライス表記のように解釈されます。

いいえ。期待どおりの結果が得られています。

重複する一致の数をカウントする場合は、次を使用しますregex。

>>> import re
>>> 
>>> len(re.findall(r'(a)(?=\1)', 'aaa'))
2

aこれにより、の後にが続くすべての出現箇所が検索されaます。ゼロ幅アサーションである先読みを使用したため、 2 番目aはキャプチャされません。

score 6 · Accepted Answer

haystack = "aaaa"
needle   = "aa"

matches  = sum(haystack[i:i+len(needle)] == needle 
               for i in xrange(len(haystack)-len(needle)+1))

# for Python 3 use range instead of xrange

score 1 · Accepted Answer

解決策は重複を考慮していません。

これを試して：

big_string = "aaaa"
substring = "aaa"
count = 0 

for char in range(len(big_string)):
    count += big_string[char: char + len(subtring)] == substring

print count

score 0 · Accepted Answer

重複しない部分文字列を探しているように見えるため、注意が必要です。これを修正するには、次のようにします。

len([s.start() for s in re.finditer('(?=aa)', 'aaa')])

部分文字列の開始位置を気にしない場合は、次のことができます。

len([_ for s in re.finditer('(?=aa)', 'aaa')])

私より頭のいい人なら、パフォーマンスに違いがあることを示すことができるかもしれませんが :)

python - Python:部分文字列の重複出現をカウントする方法

4 に答える 4

Related

Reference