java - 正規表現の「+」を理解する

Question

ツイートからすべてのユーザー名を削除する正規表現があります。次のようになります。

regexFinder = "(?:\\s|\\A)[@]+([A-Za-z0-9-_]+):";

各コンポーネントの機能を理解しようとしています。これまでのところ、私は持っています：

(       Used to begin a “group” element
?:      Starts non-capturing group (this means one that will be removed from the final result)
\\s     Matches against shorthand characters
|       or
\\A     Matches at the start of the string and matches a position as opposed to a character
[@]     Matches against this symbol (which is used for Twitter usernames)
+       Match the previous followed by
([A-Za-z0-9- ]  Match against any capital or small characters and numbers or hyphens

私は最後のビットで少し迷っています。+): の意味を教えてください。ブラケットがグループを終了していると仮定していますが、コロンまたはプラス記号は得られません。

正規表現の理解に誤りがあった場合は、遠慮なく指摘してください。

score 1 · Accepted Answer

正規表現のプラス記号は、「前の文字または文字グループが 1 回以上出現する」ことを意味します。2 番目のプラス記号は 2 番目の括弧のセット内にあるため、基本的には、2 番目の括弧のセットが、少なくとも 1 つの小文字または大文字の文字、数字、またはハイフンで構成される任意の文字列と一致することを意味します。

コロンに関しては、Java の正規表現クラスでは意味がありません。よくわからない場合は、他の誰かがすでに知っています。

score 1 · Accepted Answer

さて、私たちは見ていきます..

[@]+                 any character of: '@' (1 or more times)
   (                 group and capture to \1:
    [A-Za-z0-9-_]+   any character of: (a-z A-Z), (0-9), '-', '_' (1 or more times)
   )                 end of capture group \1
   :                 look for and match ':'

次の量指定子が認識されます。

*      Match 0 or more times
+      Match 1 or more times
?      Match 1 or 0 times
{n}    Match exactly n times
{n,}   Match at least n times
{n,m}  Match at least n but not more than m times

score 1 · Accepted Answer

は+実際には、それに続くものの「1 つまたは複数」を意味します。

この場合[@]+、「1 つまたは複数の @ 記号」を[A-Za-z0-9-_]+意味し、「1 つまたは複数の文字、数字、ダッシュ、またはアンダースコア」を意味します。+はいくつかの量指定子の 1 つです。詳細については、こちらをご覧ください。

最後のコロンは、一致の最後にコロンがあることを確認するだけです。

視覚化を見ると役立つ場合があります。これはdebuggexによって生成されたものです。

ここに画像の説明を入力

score 1 · Accepted Answer

記号は「前の+文字を 1 回以上繰り返すことができる」ことを意味します。これは、「前の文字を0回以上*繰り返すことができる」ことを意味する記号とは対照的です。私が知る限り、コロンはリテラルであり、文字列内のリテラルに一致します。:

java - 正規表現の「+」を理解する

4 に答える 4

Related

Reference