c# - 指定された文字列から URL を抽出する C# 正規表現パターン - 完全な html URL ではなく、裸のリンクも

Question

次のことを行う正規表現が必要です

Extract all strings which starts with http://
Extract all strings which starts with www.

したがって、これら2を抽出する必要があります。

たとえば、以下の特定の文字列テキストがあります

house home go www.monstermmorpg.com nice hospital http://www.monstermmorpg.com this is incorrect url http://www.monstermmorpg.commerged continue

したがって、上記の文字列から取得します

    www.monstermmorpg.com
http://www.monstermmorpg.com
http://www.monstermmorpg.commerged

正規表現または別の方法を探しています。ありがとうございました。

C#4.0

score 89 · Accepted Answer

これを処理するためにいくつかの非常に単純な正規表現を作成するか、より伝統的な文字列分割 + LINQ 方法論を使用することができます。

正規表現

var linkParser = new Regex(@"\b(?:https?://|www\.)\S+\b", RegexOptions.Compiled | RegexOptions.IgnoreCase);
var rawString = "house home go www.monstermmorpg.com nice hospital http://www.monstermmorpg.com this is incorrect url http://www.monstermmorpg.commerged continue";
foreach(Match m in linkParser.Matches(rawString))
    MessageBox.Show(m.Value);

説明パターン:

\b       -matches a word boundary (spaces, periods..etc)
(?:      -define the beginning of a group, the ?: specifies not to capture the data within this group.
https?://  - Match http or https (the '?' after the "s" makes it optional)
|        -OR
www\.    -literal string, match www. (the \. means a literal ".")
)        -end group
\S+      -match a series of non-whitespace characters.
\b       -match the closing word boundary.

http:// OR https:// OR www. (?:https?://|www\.)基本的に、パターンは次の空白までのすべての文字で始まり、一致する文字列を探します。

従来の文字列オプション

var rawString = "house home go www.monstermmorpg.com nice hospital http://www.monstermmorpg.com this is incorrect url http://www.monstermmorpg.commerged continue";
var links = rawString.Split("\t\n ".ToCharArray(), StringSplitOptions.RemoveEmptyEntries).Where(s => s.StartsWith("http://") || s.StartsWith("www.") || s.StartsWith("https://"));
foreach (string s in links)
    MessageBox.Show(s);

c# - 指定された文字列から URL を抽出する C# 正規表現パターン - 完全な html URL ではなく、裸のリンクも

3 に答える 3

正規表現

従来の文字列オプション

Related

Reference