java - より複雑な正規表現の分析

Question

私がした以前の質問では、

誰かが私の問題に対して素晴らしい答えをくれました (上記のリンクで説明されているように) が、私はそれを完全に理解することができませんでした。誰か助けてくれませんか？私が与えられた正規表現はこれです」

"(?s)(?=(([^\"]+\"){2})*[^\"]*$)\\s+"

いくつかの基本的なことは理解できますが、この正規表現には、最初の s の前にある疑問符や、疑問符と方程式で正確に 2 番目の括弧がどのように機能するかなど、Google を徹底的に検索しても見つからない部分があります。最初に。それを拡張して、「」などの他のタイプの引用符で動作できるようにすることも可能ですか?

どんな助けでも本当に感謝しています。

score 2 · Accepted Answer

"(?s)(?=(([^\"]+\"){2})*[^\"]*$)\\s+"説明しました。

(?s)    # This equals a DOTALL flag in regex, which allows the `.` to match newline characters. As far as I can tell from your regex, it's superfluous.
(?=     # Start of a lookahead, it checks ahead in the regex, but matches "an empty string"(1) read more about that [here][1] 
(([^\"]+\"){2})*  # This group is repeated any amount of times, including none. I will explain the content in more detail.
    ([^\"]+\")    # This is looking for one or more occurrences of a character that is not `"`, followed by a `"`. 
    {2}           # Repeat 2 times. When combined with the previous group, it it looking for 2 occurrences of text followed by a quote. In effect, this means it is looking for an even amount of `"`.
[^\"]*  # Matches any character which is not a double quote sign. This means literally _any_ character, including newline characters without enabling the DOTALL flag
$       # The lookahead actually inspects until end of string.
)       # End of lookahead
\\s+    # Matches one or more  whitespace characters, including spaces, tabs and so on

2 回繰り返される複雑なグループは、2 つの間にないこの文字列の空白に一致します"。

text that has a "string in it".

String.split で使用すると、文字列が次のように分割されます。[text, that, has, a, "string in it".]

が偶数個ある場合にのみ一致する"ため、次の例はすべてのスペースに一致します。

text that nearly has a "string in it.

文字列を分割する[text, that, nearly, has, a, "string, in, it.]

(1) キャプチャグループが「空の文字列」に一致すると言うとき、実際には何もキャプチャしないことを意味します。正規表現のポイントから先を見て、条件をチェックするだけで、実際には何もキャプチャされません。実際のキャプチャは\\s+、先読みに従うことによって行われます。

score 1 · Accepted Answer

これはかなり単純です..

概念

偶数の先\s+があるときはいつでも分割されます。"

例えば：

Hello hi "Hi World"
     ^  ^   ^
     |  |   |->will not split here since there are odd number of "
     ----
      |
      |->split here because there are even number of " ahead

文法

\sまたは\nまたは\rまたはspaceまたは\t

+前の文字またはグループ 1 から何度も一致する量指定子です

[^\"]以外のものに一致します"

(x){2}x2回一致します

a(?=bc)a の後に bc が続く場合に一致します

(?=ab)a最初に現在の位置から ab をチェックし、次にその位置に戻ります。次に、a と一致します。(?=ab)cc に一致しません

With (?s)(singleline mode).は newlines と一致します。そのため、この場合は必要あり(?s)ません。.

私は使うだろう

\s+(?=([^"]*"[^"]*")*[^"]*$)

score 1 · Accepted Answer

この(?s)部分は埋め込みフラグ式DOTALLで、モードを有効にします。これは次のことを意味します。

dotall モードでは、式 . 行末記号を含む任意の文字に一致します。デフォルトでは、この式は行末記号と一致しません。

は先読み表現です(?=expr)。これは、正規表現がに一致するように見えますが、残りの評価を続行する前に同じポイントに戻ることを意味します。expr

この場合、正規表現は任意の\\s+出現に一致することを意味し、その後に任意の偶数のが続き、最後"に非 - が続きます"( $)。つまり、偶数個の"先があることを確認します。

それは間違いなく他の引用符にも拡張できます. 唯一の問題は([^\"]+\"){2}、.\n{2}

java - より複雑な正規表現の分析

3 に答える 3

Related

Reference