java - Javaを使用してテキストからURLを削除する

Question

テキストの例にある URL を削除する方法

String str="Fear psychosis after #AssamRiots - http://www.google.com/LdEbWTgD http://www.yahoo.com/mksVZKBz";

正規表現を使用していますか？

テキスト内のすべての URL を削除したい。しかし、それは機能していません、私のコードは次のとおりです:

String pattern = "(http(.*?)\\s)";
Pattern pt = Pattern.compile(pattern);
Matcher namemacher = pt.matcher(input);
if (namemacher.find()) {
  str=input.replace(namemacher.group(0), "");
}

score 22 · Accepted Answer

StringURLを含むを入力します

private String removeUrl(String commentstr)
    {
        String urlPattern = "((https?|ftp|gopher|telnet|file|Unsure|http):((//)|(\\\\))+[\\w\\d:#@%/;$()~_?\\+-=\\\\\\.&]*)";
        Pattern p = Pattern.compile(urlPattern,Pattern.CASE_INSENSITIVE);
        Matcher m = p.matcher(commentstr);
        int i = 0;
        while (m.find()) {
            commentstr = commentstr.replaceAll(m.group(i),"").trim();
            i++;
        }
        return commentstr;
    }

score 5 · Accepted Answer

さて、あなたはあなたのテキストについての情報を提供していません、それであなたのテキストがこのように見えると仮定して："Some text here http://www.example.com some text there"、あなたはこれをすることができます：

String yourText = "blah-blah";
String cleartext = yourText.replaceAll("http.*?\\s", " ");

これにより、「http」で始まり、最初のスペース文字までのすべてのシーケンスが削除されます。

StringクラスのJavadocを読む必要があります。それはあなたのために物事を明らかにします。

score 4 · Accepted Answer

URL をどのように定義しますか? http:// だけでなく、https:// や、ftp://、rss://、またはカスタムプロトコルなどの他のプロトコルもフィルタリングしたい場合があります。

たぶん、この正規表現は仕事をするでしょう:

[\S]+://[\S]+

説明：

1 つ以上の非空白
後に文字列「://」が続きます
1 つ以上の非空白が続きます

score 4 · Accepted Answer

URL に & や \ などの文字が含まれている場合、replaceAll はこれらの文字を処理できないため、上記の回答は機能しないことに注意してください。私にとってうまくいったのは、新しい文字列変数でこれらの文字を削除し、 m.find() の結果からそれらの文字を削除し、新しい文字列変数で replaceAll を使用することでした。

private String removeUrl(String commentstr)
{
    // rid of ? and & in urls since replaceAll can't deal with them
    String commentstr1 = commentstr.replaceAll("\\?", "").replaceAll("\\&", "");

    String urlPattern = "((https?|ftp|gopher|telnet|file|Unsure|http):((//)|(\\\\))+[\\w\\d:#@%/;$()~_?\\+-=\\\\\\.&]*)";
    Pattern p = Pattern.compile(urlPattern,Pattern.CASE_INSENSITIVE);
    Matcher m = p.matcher(commentstr);
    int i = 0;
    while (m.find()) {
        commentstr = commentstr1.replaceAll(m.group(i).replaceAll("\\?", "").replaceAll("\\&", ""),"").trim();
        i++;
    }
    return commentstr;
}

score 0 · Accepted Answer

m.group(0)上記の回答のいずれかで述べたように、 m.group(i)whereiを呼び出すたびに増加するのではなく、空の文字列に置き換える必要があります。m.find()

private String removeUrl(String commentstr)
{
    String urlPattern = "((https?|ftp|gopher|telnet|file|Unsure|http):((//)|(\\\\))+[\\w\\d:#@%/;$()~_?\\+-=\\\\\\.&]*)";
    Pattern p = Pattern.compile(urlPattern,Pattern.CASE_INSENSITIVE);
    Matcher m = p.matcher(commentstr);
    StringBuffer sb = new StringBuffer(commentstr.length);
    while (m.find()) {
        m.appendReplacement(sb, "");
    }
    return sb.toString();
}

java - Javaを使用してテキストからURLを削除する

7 に答える 7

Related

Reference