python - 複数の正規表現の検索と置換

Question

ファイルから正規表現を取得し、別のファイルで検索と置換を実行する簡単なスクリプトを作成しようとしています。これは私が持っているものですが、機能しません。ファイルは変更されていません。何が間違っているのでしょうか。

import re, fileinput

separator = ' => '

file = open("searches.txt", "r")

for search in file:
    pattern, replacement = search.split(separator)
    pattern = 'r"""' + pattern + '"""'
    replacement = 'r"""' + replacement + '"""'
    for line in fileinput.input("test.txt", inplace=1):
        line = re.sub(pattern, replacement, line)
        print(line, end="")

ファイルsearches.txtは次のようになります。

<p (class="test">.+?)</p> => <h1 \1</h1>
(<p class="not">).+?(</p>) => \1This was changed by the script\2

そしてこのようなtest.txt：

<p class="test">This is an element with the test class</p>
<p class="not">This is an element without the test class</p>
<p class="test">This is another element with the test class</p>

ファイルから式が正しく取得されているかどうかを確認するためのテストを行いました。

>>> separator = ' => '
>>> file = open("searches.txt", "r")
>>> for search in file:
...     pattern, replacement = search.split(separator)
...     pattern = 'r"""' + pattern + '"""'
...     replacement = 'r"""' + replacement + '"""'
...     print(pattern)
...     print(replacement)
... 
r"""<p (class="test">.+?)</p>"""
r"""<h1 \1</h1>
"""
r"""(<p class="not">).+?(</p>)"""
r"""\1This was changed by the script\2"""

最初の置換の最後のトリプルクォートは、何らかの理由で改行になっていますが、これが私の問題の原因である可能性がありますか？

score 3 · Accepted Answer

あなたは必要ありません

pattern = 'r"""' + pattern + '"""'

re.subの呼び出しでpatternは、実際の正規表現である必要があります。だから<p (class="test">.+?)</p>。これらの二重引用符をすべてラップすると、パターンがファイル内のテキストと一致しないようになります。

あなたはこのようなコードを見たようですが：

replaced = re.sub(r"""\w+""", '-')

その場合、r"""はpythonインタープリターに、「生の」複数行の文字列、またはバックスラッシュシーケンスを置き換えてはならない文字列（\ nを改行に置き換えた場合など）について話していることを示します。\wプログラマーは、バックスラッシュを引用せずに正規表現シーケンス（上記のような）を使用したいので、Pythonで「生の」文字列を使用して正規表現を引用することがよくあります。生の文字列がないと、正規表現はで'\\w+'ある必要があり、混乱を招きます。

ただし、いずれの場合も、三重二重引用符はまったく必要ありません。最後のコードフレーズは、単純に次のように記述されている可能性があります。

replaced = re.sub(r'\w+', '-')

最後に、もう1つの問題は、入力ファイルに改行が含まれていて、パターン=>置換の各ケースを分離していることです。つまり、実際には「pattern => replace \ n」であり、末尾の改行は置換変数の後に続きます。やってみてください：

for search in file:
    search = search.rstrip() #Remove the trailing \n from the input
    pattern, replacement = search.split(separator)

score 1 · Accepted Answer

2つの観察：

1）.strip()次のようにファイルを読み取るときに使用します。

pattern, replacement = search.strip().split(separator)

これ\nにより、ファイルからが削除されます

2）パターンから正規表現のメタ文字をエスケープする場合は、使用しているr "" "+ str +" ""フォームではなくre.escape（）を使用します。

python - 複数の正規表現の検索と置換

2 に答える 2

Related

Reference