python - 「/」が逆方向に検索されるまですべての文字を返す正規表現

Question

私はこの正規表現に問題があり、ほとんどそこにいると思います。

m =re.findall('[a-z]{6}\.[a-z]{3}\.[a-z]{2} (?=\" target)', 'http://domain.com.uy " target')

これにより、必要な「正確な」出力が得られます。それは明らかですが、これは前の 6 文字に一致するdomain.com.uyだけの例であり、これは私が望んでいるものではありません。[a-z]{6}

私はそれを返したいdomain.com.uyので、基本的に命令は「/」に遭遇するまで（後方に）任意の文字と一致します。

編集：

m =re.findall('\w+\.[a-z]{3}\.[a-z]{2} (?=\" target)', 'http://domain.com.uy " target')

私が望むものに非常に近いですが、「_」または「-」と一致しません。

完全を期すために、私は必要ありませんhttp://

質問が十分に明確であることを願っています。解釈の余地がある場合は、必要な説明を求めてください。

少し早いですがお礼を！

score 1 · Accepted Answer

別のオプションは、次のような肯定的な後読み(?<=//)を使用することです。

>>> re.search(r'(?<=//).+(?= \" target)', 
...           'http://domain.com.uy " target').group(0)
'domain.com.uy'

必要に応じて、これは URL 自体のスラッシュと一致することに注意してください。

>>> re.search(r'(?<=//).+(?= \" target)',
...           'http://example.com/path/to/whatever " target').group(0)
'example.com/path/to/whatever'

パスやクエリパラメータを使用せずにベアドメインのみが必要な場合は、r'(?<=//)([^/]+)(/.*)?(?= \" target)'グループ 1 を使用してキャプチャできます。

>>> re.search(r'(?<=//)([^/]+)(/.*)?(?= \" target)',
...           'http://example.com/path/to/whatever " target').groups()
('example.com', '/path/to/whatever')

score 1 · Accepted Answer

正規表現が必須ではなく、単に Python で URL から FQDN を抽出したい場合。とを使用urlparseしstr.split()ます。

>>> from urlparse import urlparse
>>> url = 'http://domain.com.uy " target'
>>> urlparse(url)
ParseResult(scheme='http', netloc='domain.com.uy " target', path='', params='', query='', fragment='')

これにより、URL が構成要素に分割されました。私たちが欲しいnetloc：

>>> urlparse(url).netloc
'domain.com.uy " target'

空白で分割:

>>> urlparse(url).netloc.split()
['domain.com.uy', '"', 'target']

最初の部分だけ:

>>> urlparse(url).netloc.split()[0]
'domain.com.uy'

score 0 · Accepted Answer

次のように簡単です。

[^/]+(?= " target)

ただし、http://domain.com/folder/site.php ではドメインが返されないことに注意してください。また、正規表現を文字列で適切にエスケープすることを忘れないでください。

score 0 · Accepted Answer

/これを試してください（ Pythonでエスケープする必要があるかもしれません）：

/([^/]*)$

python - 「/」が逆方向に検索されるまですべての文字を返す正規表現

4 に答える 4

Related

Reference