python - 正規表現置換でのPythonのエスケープ文字列

Question

以下のコードの出力：

rpl = 'This is a nicely escaped newline \\n'
my_string = 'I hope this apple is replaced with a nicely escaped string'
reg = re.compile('apple')
reg.sub( rpl, my_string )

..は：

'I hope this This is a nicely escaped newline \n is replaced with a nicely escaped string'

..印刷時：

これがうまくエスケープされた改行であることを願っています

うまくエスケープされた文字列に置き換えられます

それで、Pythonは他の文字列の「apple」を置き換えるときに文字列をエスケープ解除していますか？今のところ私はちょうどやった

reg.sub( rpl.replace('\\','\\\\'), my_string )

これは安全ですか？Pythonがそれを行うのを止める方法はありますか？

score 4 · Accepted Answer

help(re.sub)[強調鉱山]から：

sub（pattern、repl、string、count = 0、flags = 0）

文字列内のパターンの左端の重複しないオカレンスを置換replで置き換えて取得した文字列を返します。 replは、文字列または呼び出し可能のいずれかです。文字列の場合、その中のバックスラッシュエスケープが処理されます。 呼び出し可能である場合は、一致オブジェクトが渡され、使用する置換文字列を返す必要があります。

これを回避する1つの方法は、lambda：を渡すことです。

>>> reg.sub(rpl, my_string )
'I hope this This is a nicely escaped newline \n is replaced with a nicely escaped string'
>>> reg.sub(lambda x: rpl, my_string )
'I hope this This is a nicely escaped newline \\n is replaced with a nicely escaped string'

score 0 · Accepted Answer

Pythonのreモジュールに使用されるすべての正規表現パターンは、検索パターンと置換パターンの両方を含め、エスケープされていません。これが、r使用可能なパターンを作成するために必要な「バックワッキング」の量を減らすため、修飾子がPythonの正規表現パターンで一般的に使用される理由です。

修飾子は文字列定数のr前に表示され、基本的にすべての\文字（文字列区切り文字の前の文字を除く）を逐語的にします。だからr'\\' == '\\\\'、、、r'\n' == '\\n'。

あなたの例を次のように書く

rpl = r'This is a nicely escaped newline \\n'
my_string = 'I hope this apple is replaced with a nicely escaped string'
reg = re.compile(r'apple')
reg.sub( rpl, my_string )

期待どおりに動作します。

python - 正規表現置換でのPythonのエスケープ文字列

2 に答える 2

Related

Reference