python - グループ化による置換正規表現

Question

のような文字列が与えられた場合\url{www.mywebsite.com/home/us/index.html}'、URL の最後から 2 番目のスラッシュまでの部分をに置き換えて、次のwww.example.com/ようにします。

\url{www.example.com/us/index.html}`

URL に少なくとも 1 つのスラッシュが存在すると仮定します。これが私が試したものです。

>>> pattern = r'(\url{).*([^/]*/[^/]*})'
>>> prefix = r'\1www.example.com/\2'
>>> re.sub(pattern, prefix, '\url{www.mywebsite.com/home/us/index.html}')
'\\url{www.example.com//index.html}'

正規表現内usに明示的に含めたにもかかわらず、その部分が結果に含まれていない理由がわかりません。[^/]*

score 1 · Accepted Answer

greedy.*は、最後のスラッシュまですべてに一致します。次に、グループはに一致/index.htmlし、最初の[^/]*一致は何もありません (何も一致しないため*)。

.*最後から 2 番目のスラッシュの前でを強制的に停止するには、後ろにスラッシュを含めます。これにより、グループがキャプチャするために残しておきたいが.*消費されるのを防ぎます。us

>>> pattern = r'(\url{).*/([^/]*/[^/]*})'
>>> re.sub(pattern, prefix, '\url{www.mywebsite.com/home/us/index.html}')
'\\url{www.example.com/us/index.html}'

score 1 · Accepted Answer

また、先読み/後読みを使用します。

import re
# match anything that has a preceding '{' up to the last two slashes:
pattern = r'(?<={).*(?=(?:[^/]*/){2})'
prefix = r'www.example.com'
print re.sub(pattern, prefix, '\url{www.mywebsite.com/home/us/index.html}')

出力

\url{www.example.com/us/index.html}

または正規表現をまったく使用せずに：

l='\url{www.mywebsite.com/home/us/index.html}'.split(r"/")[-2:]
l=['\url{www.example.com', l[0], l[1]]
print "/".join(l)

python - グループ化による置換正規表現

2 に答える 2

Related

Reference