python - 正規表現で重複する一致を見つける方法は?

Question

>>> match = re.findall(r'\w\w', 'hello')
>>> print match
['he', 'll']

\w\w は 2 文字を意味するため、'he' と 'll' が必要です。しかし、'el' と 'lo'が正規表現と一致しないのはなぜですか?

>>> match1 = re.findall(r'el', 'hello')
>>> print match1
['el']
>>>

score 132 · Accepted Answer

findallデフォルトでは、重複する一致は生成されません。ただし、この式は次のことを行います。

>>> re.findall(r'(?=(\w\w))', 'hello')
['he', 'el', 'll', 'lo']

先読みアサーション(?=...)は次のとおりです。

(?=...)次に一致する場合...に一致しますが、文字列を消費しません。これは、先読みアサーションと呼ばれます。たとえば、が後に続く場合にのみIsaac (?=Asimov)一致します。'Isaac ''Asimov'

score 46 · Accepted Answer

重複一致をサポートする新しい Python 正規表現モジュールを使用できます。

>>> import regex as re
>>> match = re.findall(r'\w\w', 'hello', overlapped=True)
>>> print match
['he', 'el', 'll', 'lo']

4 に答える 4