java - *正規表現で数量詞を正しく取得できませんか?

Question

私は正規表現が初めてで、正規表現の量指定子セクションを通過しています。分数について質問*です。*数量詞の定義は次のとおりです。

X*- 文字 X がまったく見つからないか、複数見つかる
.*- 任意の文字列

上記の定義に基づいて、小さなプログラムを作成しました。

public static void testQuantifier() {
    String testStr = "axbx";
    System.out.println(testStr.replaceAll("x*", "M"));
    //my expected output is MMMM but actual output is MaMMbMM
    /*
    Logic behind my expected output is:
    1. it encounters a which means 0 x is found. It should replace a with M.
    2. it encounters x which means 1 x is found. It should replace x with M.
    3. it encounters b which means 0 x is found. It should replace b with M.
    4. it encounters x which means 1 x is found. It should replace x with M.
    so output should be MMMM but why it is MaMMbMM?
    */

    System.out.println(testStr.replaceAll(".*", "M"));
    //my expected output is M but actual output is MM

    /*
    Logic behind my expected output is:
    It encounters axbx, which is any character sequence, it should 
    replace complete sequence with M.
    So output should be M but why it is MM?
    */
}

アップデート：-

改訂された理解によると、出力はとして期待されますが、でMaMMbMはありませんMaMMbMM。では、最後に余分な M を取得する理由がわかりませんか?

最初の正規表現に対する私の改訂された理解は次のとおりです。

1. it encounters a which means 0 x is found. It should replace a with Ma.
2. it encounters x which means 1 x is found. It should replace x with M.
3. it encounters b which means 0 x is found. It should replace b with Mb.
4. it encounters x which means 1 x is found. It should replace x with M.
5. Lastly it encounters end of string at index 4. So it replaces 0x at end of String with M.

（文字列の終わりのインデックスも考慮するのは奇妙だと思いますが）

これで最初の部分は明らかです。

また、誰かが 2 番目の正規表現を明確にすることができれば、それは役に立ちます。

score 2 · Accepted Answer

a正規表現とb一致しないため、置き換えられません。一致しない文字の前または文字列の末尾の前のxes と空の文字列は置き換えられます。

しばらく様子を見てみましょう：

文字列の先頭にいます。正規表現エンジンは an を照合しようとしxますが、here があるため失敗しaます。
正規表現エンジンはバックトラックします。これx*は、x. 一致があり、に置き換えMます。
正規表現エンジンはを超えて進み、aとの一致に成功しxます。で置き換えMます。
正規表現エンジンはx、現在の位置 (前の一致の後) で一致を試みるようになりました。これはの直前bです。できません。
xしかし、ここでゼロ esに一致して、再びバックトラックすることができます。で置き換えMます。
正規表現エンジンはを超えて進み、bとの一致に成功しxます。で置き換えMます。
正規表現エンジンxは、文字列の末尾にある現在の位置 (前の一致の後) で一致を試みるようになりました。できません。
xしかし、ここでゼロ esに一致して、再びバックトラックすることができます。で置き換えMます。

ちなみに、これは実装依存です。たとえば、Python では、

>>> re.sub("x*", "M", "axbx")
'MaMbM'

そのため、パターンの空の一致は、前の一致に隣接していない場合にのみ置き換えられます。

java - *正規表現で数量詞を正しく取得できませんか?

2 に答える 2

Related

Reference