python - Pythonの正規表現は特殊文字にも一致します

Question

現在、この単純なスクリプトを使用して、文字列内のタグを検索しています。

tag = "#tag"
text = "test string with #tag inserted"
match = re.search(tag, text, re.IGNORECASE) #matches

ここで、テキストに a-acute が含まれているとします。

tag = "#tag"
text = "test string with #tág inserted"
match = re.search(tag, text, re.IGNORECASE) #does not match :(

このマッチを機能させるにはどうすればよいですか? 他の特殊文字 (é、è、í など) でも機能するはずです。

前もって感謝します！

score 3 · Accepted Answer

unidecodeでテキストを正規化できます:

import unicodedata

tag = "#tag"
text = u"test string with #tág inserted and a #tag"
text=unidecode(text)
re.findall(tag, text, re.IGNORECASE)

アウト：

['#tag', '#tag']

1 に答える 1