java - Java: 正規表現

Question

多くの画像タグを含む Html 文字列があります。タグを取得して変更する必要があります。例えば：

String imageRegex = "(<img.+(src=\".+\").+/>){1}";
String str = "<img src=\"static/image/smiley/comcom/9.gif\" smilieid=\"296\" border=\"0\" alt=\"\" />hello world<img src=\"static/image/smiley/comcom/7.gif\" smilieid=\"294\" border=\"0\" alt=\"\" />";
Matcher matcher = Pattern.compile(imageRegex, Pattern.CASE_INSENSITIVE).matcher(msg);
int i = 0;
while (matcher.find()) {
    i++;
    Log.i("TAG", matcher.group());
}

結果は次のとおりです。

<img src="static/image/smiley/comcom/9.gif" smilieid="296" border="0" alt="" />hello world<img src="static/image/smiley/comcom/7.gif" smilieid="294" border="0" alt="" />

でもそれは私が欲しいのではなく、結果が欲しいのです

<img src="static/image/smiley/comcom/9.gif" smilieid="296" border="0" alt="" />
<img src="static/image/smiley/comcom/7.gif" smilieid="294" border="0" alt="" />

私の正規表現の何が問題になっていますか?

score 1 · Accepted Answer

試してみてください(<img)(.*?)(/>)。これでうまくいくはずですが、はい、人々が何度も言うように、HTML の解析に正規表現を使用しないでください。

私はEclipseをインストールしていませんが、VS2010を持っていて、これでうまくいきます。

        String imageRegex = "(<img)(.*?)(/>)";
        String str = "<img src=\"static/image/smiley/comcom/9.gif\" smilieid=\"296\" border=\"0\" alt=\"\" />hello world<img src=\"static/image/smiley/comcom/7.gif\" smilieid=\"294\" border=\"0\" alt=\"\" />";
        System.Text.RegularExpressions.MatchCollection match = System.Text.RegularExpressions.Regex.Matches(str, imageRegex, System.Text.RegularExpressions.RegexOptions.IgnoreCase);
        StringBuilder sb = new StringBuilder();
        foreach (System.Text.RegularExpressions.Match m in match)
        {
            sb.AppendLine(m.Value);
        }
        System.Windows.MessageBox.Show(sb.ToString());

結果：

<img src="static/image/smiley/comcom/9.gif" smilieid="296" border="0" alt="" /> 
<img src="static/image/smiley/comcom/7.gif" smilieid="294" border="0" alt="" />

score 0 · Accepted Answer

David Mは正しいです。実際にこれを実行しようとすべきではありませんが、特定の問題は+、正規表現の数量詞が貪欲であるため、一致する可能性のある最長の部分文字列と一致することです。

数量詞の詳細については、正規表現のチュートリアルを参照してください。

score 0 · Accepted Answer

HTML の解析に正規表現を使用することはお勧めしません。JSoup または同様のソリューションを検討してください

Document doc = Jsoup.connect("http://en.wikipedia.org/").get();
Elements images = doc.select("img");

正規表現で HTML を解析しようとするたびに、邪悪な子供が処女の血を泣き叫び、ロシアのハッカーが Web アプリケーションを pwn します。

java - Java: 正規表現

3 に答える 3

Related

Reference