regex - 一般的な URL に一致する正規表現が必要です

Question

任意のプロトコル (http、https、shttp、ftp、svn、mysql、および私が知らないもの) を使用して、一般的な URL をテストする必要があります。

私の最初のパスはこれです：

\w+://(\w+\.)+[\w+](/[\w]+)(\?[-A-Z0-9+&@#/%=~_|!:,.;]*)?

（PCREと.NETなので空想することは何もありません）

score 3 · Accepted Answer

3

RFC2396によると:

^(([^:/?#]+):)?(//([^/?#]*))?([^?#]*)(\?([^#]*))?(#(.*))?

于 2008-11-20T22:48:46.157 に答える

score 2 · Accepted Answer

その正規表現をウィキの回答として追加します：

[\w+-]+://([a-zA-Z0-9]+\.)+[[a-zA-Z0-9]+](/[%\w]+)(\?[-A-Z0-9+&@#/%=~_|!:,.;]*)?

オプション 2 (CMS に関して)

^(([^:/?#]+):)?(//([^/?#]*))?([^?#]*)(\?([^#]*))?(#(.*))?

しかし、それは、より制限的になり、他のものと区別するためにトリミングされた正気なものを手抜きすることです.

proto      ://  name      : pass      @  server    :port      /path     ? args
^([^:/?#]+)://(([^/?#@:]+(:[^/?#@:]+)?@)?[^/?#@:]+(:[0-9]+)?)(/[^?#]*)(\?([^#]*))?

score 1 · Accepted Answer

私は少し違う方向からこれに来ました。gchat のマッチング機能とリンク機能をエミュレートしたかっsomething.co.ukたのです。.そこで、次のピリオドまたは両側にスペースがないa を探し、空白に達するまでその周りのすべてを取得する正規表現を使用しました。URI の末尾のピリオドと一致しますが、後で削除します。したがって、いくつかの可能性を逃すよりも誤検知を好む場合、これはオプションになる可能性があります

url_re = re.compile(r"""
           [^\s]             # not whitespace
           [a-zA-Z0-9:/\-]+  # the protocol and domain name
           \.(?!\.)          # A literal '.' not followed by another
           [\w\-\./\?=&%~#]+ # country and path components
           [^\s]             # not whitespace""", re.VERBOSE) 

url_re.findall('http://thereisnothing.com/a/path adn some text www.google.com/?=query#%20 https://somewhere.com other-countries.co.nz. ellipsis... is also a great place to buy. But try text-hello.com ftp://something.com')

['http://thereisnothing.com/a/path',
 'www.google.com/?=query#%20',
 'https://somewhere.com',
 'other-countries.co.nz.',
 'text-hello.com',
 'ftp://something.com']

regex - 一般的な URL に一致する正規表現が必要です

3 に答える 3

Related

Reference