java - Java 正規表現を使用した文字列の検索と置換

Question

先頭タグがor<html >や. 検索文字列を正規表現形式で指定するには?<html>< html>

String source = "<html >The quick brown fox jumps over the brown lazy dog.</html >";
String find = "<html>";
String replace = "";        
Pattern pattern = Pattern.compile(find);        
Matcher matcher = pattern.matcher(source);        
String output = matcher.replaceAll(replace); 
System.out.println("Source = " + source);
System.out.println("Output = " + output);

score 3 · Accepted Answer

を実行することで問題を回避できますが、 HTML を正規表現で処理しない<\\s*html\\s*>でください。義務的なリンク。

は 0 個以上の\\s*空白を表します。

score 1 · Accepted Answer

正規表現を使用してHTMLを解析しようとしないでください。について読んでみてくださいXPath。非常に役立ちます。XPathデフォルトではドキュメントの検証を試みますが、有効にすることもできますHtmlCleaner。

score 0 · Accepted Answer

タグ内のテキストを抽出するには、次のようなものを使用します

String source = "<html >The quick brown fox jumps over the brown lazy dog.</html >";
System.out.println( source.replaceAll( "^<\\s*html\\s*>(.*)<\\s*\\/html\\s*>$", "$1" ) );
// output is:
// The quick brown fox jumps over the brown lazy dog.

ただし、正規表現によるhtmlの解析は避けてください。このトピックを読んでください。

score 0 · Accepted Answer

この例は役に立つかもしれません。

String source = "<html >The quick brown fox jumps over the brown lazy dog.</html >";

        String find = "\\<.*?>";
        String replace = "";        
        Pattern pattern = Pattern.compile(find);        
        Matcher matcher = pattern.matcher(source);        
        String output = matcher.replaceAll(replace); 
        System.out.println("Source = " + source);
        System.out.println("Output = " + output);

java - Java 正規表現を使用した文字列の検索と置換

4 に答える 4

Related

Reference