python - 正規表現の検索と複数置換

Question

次のすべてのケースに一致する正規表現を作成しようとしています

[[any text or char her]]

一連のテキストで。

例えば：

My name is [[Sean]]
There is a [[new and cool]] thing here.

これはすべて、正規表現を使用して正常に機能します。

data = "this is my tes string [[ that does some matching ]] then returns."
p = re.compile("\[\[(.*)\]\]")
data = p.sub('STAR', data)

問題は、[[hello]] と [[bye]] という一致のインスタンスが複数ある場合です。

例えば：

data = "this is my new string it contains [[hello]] and [[bye]] and nothing else"
p = re.compile("\[\[(.*)\]\]")
data = p.sub('STAR', data)

これは、hello の開始ブラケットと bye の終了ブラケットに一致します。両方交換してほしい。

score 3 · Accepted Answer

.*]]貪欲で、 andを含むできるだけ多くのテキストに一致する[[ため、「タグ」の境界を突き破ります。

簡単な解決策は、次を追加してスターを遅延させること?です。

p = re.compile(r"\[\[(.*?)\]\]")

より良い (より堅牢で明示的ですが、少し遅い) 解決策は、タグの境界を越えて照合できないことを明確にすることです。

p = re.compile(r"\[\[((?:(?!\]\]).)*)\]\]")

説明：

\[\[        # Match [[
(           # Match and capture...
 (?:        # ...the following regex:
  (?!\]\])  # (only if we're not at the start of the sequence ]]
  .         # any character
 )*         # Repeat any number of times
)           # End of capturing group
\]\]        # Match ]]

score 2 · Accepted Answer

a の後に貪欲でない一致.*?<~~を使用するか、一致する文字をできるだけ少なくします。デフォルトでは貪欲で、できるだけ多くの文字を消費します。?+*

p = re.compile("\[\[(.*?)\]\]")

score 1 · Accepted Answer

これを使用できます：

p = re.compile(r"\[\[[^\]]+\]\]")

>>> data = "this is my new string it contains [[hello]] and [[bye]] and nothing else"
>>> p = re.compile(r"\[\[[^\]]+\]\]")
>>> data = p.sub('STAR', data)
>>> data
'this is my new string it contains STAR and STAR and nothing else'

python - 正規表現の検索と複数置換

3 に答える 3

Related

Reference