java - String.replaceFirst(regexp, "$1") を使用して一致する部分文字列を取得すると、空の文字列が表示されます。正規表現の何が問題になっていますか?

Question

ANSI エスケープシーケンスをIRC カラーシーケンスに変換したいと考えています。

そこで、正規表現 1 を書きましたが\e\[([\d;]+)?m、shell_output_string.replaceFirst ("\\e\\[([\\d;]+)?m", "$1")一致した部分文字列と残りの一致しない部分文字列の両方を返します。

次に、正規表現 2 を書きました。.*\e\[([\d;]+)?m.*文字列全体に一致し、一致した部分文字列に置き換えることができることを願っていますが、replaceFirst (".*\\e\\[([\\d;]+)?m.*", "$1")空の文字列を返しますがmatches (".*\\e\\[([\\d;]+)?m.*")、true. この正規表現の何が問題になっていますか?

次の質問は、次の質問と非常によく似ています: Pattern/Matcher group() to get substring in Java??

サンプルコード

import java.util.regex.*;
public class AnsiEscapeToIrcEscape
{
    public static void main (String[] args)
    {
//# grep --color=always bot /etc/passwd
//
//bot:x:1000:1000:bot:/home/bot:/bin/bash
byte[] shell_output_array = {
0x1B, 0x5B, 0x30, 0x31, 0x3B, 0x33, 0x31, 0x6D, 0x1B, 0x5B, 0x4B, // ^[[01;31m^[[K  (#1 - #11)
0x62, 0x6F, 0x74,   // bot  (#12 - #14)
0x1B, 0x5B, 0x6D, 0x1B, 0x5B, 0x4B, // ^[[m^[[K (#15 - #20)
0x3A, 0x78, 0x3A, 0x31, 0x30, 0x30, 0x30, 0x3A, 0x31, 0x30, 0x30, 0x30, 0x3A,   // :x:1000:1000:    (#21 - #33)
0x1B, 0x5B, 0x30, 0x31, 0x3B, 0x33, 0x31, 0x6D, 0x1B, 0x5B, 0x4B, // ^[[01;31m^[[K  (#34 - #44)
0x62, 0x6F, 0x74,   // bot  (#45 - #47)
0x1B, 0x5B, 0x6D, 0x1B, 0x5B, 0x4B, // ^[[m^[[K (#48 - #53)
0x3A, 0x2F, 0x68, 0x6F, 0x6D, 0x65, 0x2F,   // :/home/  (#54 - #60)
0x1B, 0x5B, 0x30, 0x31, 0x3B, 0x33, 0x31, 0x6D, 0x1B, 0x5B, 0x4B, // ^[[01;31m^[[K  (#61 - #71)
0x62, 0x6F, 0x74,   // bot  (#72 - #74)
0x1B, 0x5B, 0x6D, 0x1B, 0x5B, 0x4B, // ^[[m^[[K (#75 - #80)
0x3A, 0x2F, 0x62, 0x69, 0x6E, 0x2F, 0x62, 0x61, 0x73, 0x68, // :/bin/bash   (#81 - #90)
};
        String shell_output = new String (shell_output_array);
        System.out.println (shell_output);
        System.out.println ("total " + shell_output_array.length + " bytes");

        final String CSI_REGEXP = "\\e\\[";
        final String CSI_SGR_REGEXP_First = CSI_REGEXP + "([\\d;]+)?m";
        final String CSI_SGR_REGEXP = ".*" + CSI_SGR_REGEXP_First + ".*";

        System.out.println (shell_output.replaceFirst(CSI_SGR_REGEXP_First, "$1"));
        System.out.println (shell_output.replaceFirst(CSI_SGR_REGEXP, "$1"));
    }
}

score 1 · Accepted Answer

正規表現は貪欲です。つまり、各パターンは可能な限り多くの入力と一致しようとします。

これは、パターンが .* で始まる場合、パターンのその部分が可能な限り多くの入力テキストをカバーしようとすることを意味します。つまり、パターンの残りの部分が最後から一致するものを見つけようとすることを効果的に強制します。入力文字列の前方に向かって動作します。

では、文字列の末尾からの残りのパターンの最初の一致は何ですか (または、必要に応じて、一致する最後の部分文字列は何ですか)? 入力の最後から 2 番目の行にあり、^[m だけで構成されています。

これは、パターンの ([\d;]+) 部分全体が次の ? によってオプションになるため、一致します。.

つまり、最後の式には数字または ; がないため、$1 グループは空であり、空の文字列出力が得られます。

少なくとも、それをテストするために Java マシンの近くにいなくても、私はそう思います。それが役に立てば幸い。

score 0 · Accepted Answer

    The API of String's replaceFirst says :


     replaceFirst

    public String replaceFirst(String regex,
                               String replacement)

        Replaces the first substring of this string that matches the given regular expression with the given replacement.

        An invocation of this method of the form str.replaceFirst(regex, repl) yields exactly the same result as the expression

            Pattern.compile(regex).matcher(str).replaceFirst(repl)

        Note that backslashes (\) and dollar signs ($) in the replacement string may cause the results to be different than if it were being treated as a literal replacement string; see Matcher.replaceFirst(java.lang.String). Use Matcher.quoteReplacement(java.lang.String) to suppress the special meaning of these characters, if desired.

        Parameters:
            regex - the regular expression to which this string is to be matched
            replacement - the string to be substituted for the first match 
        Returns:
            The resulting String 
        Throws:
            PatternSyntaxException - if the regular expression's syntax is invalid
        Since:
            1.4
        See Also:
            Pattern



Please read the Note Part which specifies that the \ and $ may cause the result to be different.
You can use Pattern and Matcher instead.

Example  
public class RegexMatches
{
    public static void main( String args[] ){

      // String to be scanned to find the pattern.
     // String line = "This order was placed for QT3000! OK?";
     // String pattern = "(.*)(\\d+)(.*)";

      byte[] shell_output_array = {
              0x1B, 0x5B, 0x30, 0x31, 0x3B, 0x33, 0x31, 0x6D, 0x1B, 0x5B, 0x4B, // ^[[01;31m^[[K  (#1 - #11)
              0x62, 0x6F, 0x74,   // bot  (#12 - #14)
              0x1B, 0x5B, 0x6D, 0x1B, 0x5B, 0x4B, // ^[[m^[[K (#15 - #20)
              0x3A, 0x78, 0x3A, 0x31, 0x30, 0x30, 0x30, 0x3A, 0x31, 0x30, 0x30, 0x30, 0x3A,   // :x:1000:1000:    (#21 - #33)
              0x1B, 0x5B, 0x30, 0x31, 0x3B, 0x33, 0x31, 0x6D, 0x1B, 0x5B, 0x4B, // ^[[01;31m^[[K  (#34 - #44)
              0x62, 0x6F, 0x74,   // bot  (#45 - #47)
              0x1B, 0x5B, 0x6D, 0x1B, 0x5B, 0x4B, // ^[[m^[[K (#48 - #53)
              0x3A, 0x2F, 0x68, 0x6F, 0x6D, 0x65, 0x2F,   // :/home/  (#54 - #60)
              0x1B, 0x5B, 0x30, 0x31, 0x3B, 0x33, 0x31, 0x6D, 0x1B, 0x5B, 0x4B, // ^[[01;31m^[[K  (#61 - #71)
              0x62, 0x6F, 0x74,   // bot  (#72 - #74)
              0x1B, 0x5B, 0x6D, 0x1B, 0x5B, 0x4B, // ^[[m^[[K (#75 - #80)
              0x3A, 0x2F, 0x62, 0x69, 0x6E, 0x2F, 0x62, 0x61, 0x73, 0x68, // :/bin/bash   (#81 - #90)
              };
      String line = new String (shell_output_array);
      //String pattern = "(.*)(\\d+)(.*)";
      final String CSI_REGEXP = "\\e\\[";
      final String CSI_SGR_REGEXP_First = CSI_REGEXP + "([\\d;]+)?m";
      final String CSI_SGR_REGEXP = ".*" + CSI_SGR_REGEXP_First + ".*";

      // Create a Pattern object
      Pattern r = Pattern.compile(CSI_SGR_REGEXP);

      // Now create matcher object.
      Matcher m = r.matcher(line);
      while (m.find()) {
         System.out.println(m.start() + "  " + m.end());
         System.out.println("Found value: " + m.group());
      } 
   }
}

java - String.replaceFirst(regexp, "$1") を使用して一致する部分文字列を取得すると、空の文字列が表示されます。正規表現の何が問題になっていますか?

2 に答える 2

Related

Reference