python - RE を使用したテキスト置換 - 最初のオカレンスを許可し、残りを置換

翻译自：https://stackoverflow.com/questions/61258343 2020-04-16T19:18:57.430

26 次

これらのタスクをどのように達成できるかについて、いくつかの考えを探しています。

問題語の最初の出現を許可しますが、それ以降の使用と残りの問題語を禁止します。
元のドキュメント (.txt ファイル) は変更されません。print() のみを変更します。
メールの同じ構造を維持します。改行、タブ、または奇妙な間隔がある場合は、完全性を維持します。

コードサンプルは次のとおりです。

import re

# Sample email is "Hello, banned1. This is banned2. What is going on with
# banned 3? Hopefully banned1 is alright."
sample_email = open('email.txt', 'r').read()


# First use of any of these words is allowed; those following are banned
problem_words = ['banned1', 'banned2', 'banned3']


# TODO: Filter negative_words into overused_negative_words
banned_problem_words = []
for w in problem_words:
    if sample_email.count(f'\\b{w}s?\\b') > 1:
        banned_problem_words.append(w)


pattern = '|'.join(f'\\b{w}s?\\b' for w in banned_problem_words)


def list_check(email, pattern):
    return re.sub(pattern, 'REDACTED', email, flags=re.IGNORECASE)


print(list_check(sample_email, pattern))
# Result should be: "Hello, banned1. This is REDACTED. What is going on with
# REDACTED? Hopefully REDACTED is alright."

python - RE を使用したテキスト置換 - 最初のオカレンスを許可し、残りを置換

1 に答える 1

Related

Reference