php - 正規表現はセミコロンと一致しますが、コメントや引用符では一致しません

Question

正規表現テストを使用して、一致するすべてのセミコロンを返したいのですが、それらが引用符（ネストされた引用符）の外にあり、コメント化されたコードではない場合に限ります。

testfunc();
testfunc2("test;test");
testfunc3("test';test");
testfunc4('test";test');
//testfunc5();
/* testfunc6(); */
/*
  testfunc7();
*/
/*
  //testfunc8();
*/
testfunc9("test\"test");

各例の最後にあるセミコロンのみを正規表現文字列で返す必要があります。

私は以下をいじってみましたが、例testfunc3とtestfun9では失敗します。コメントも無視しません...

/;(?=(?:(?:[^"']*+["']){2})*+[^"']*+\z)/g

どんな助けでもいただければ幸いです！

score 3 · Accepted Answer

これをJSに変換する時間がありません。これがPerlサンプルの正規表現ですが、正規表現はJSで動作します。

Cコメント、二重/単一文字列引用符-Jeffrey Friedlによる「ストリップCコメント」から取得され、後でFred Curtisによって変更され、C ++コメントとターゲットセミコロン（私による）を含むように適合されました。

キャプチャグループ1（オプション）。セミコロンまでのすべてが含まれ、グループ2はセミコロンです（ただし、何でもかまいません）。

修飾子は//xsgです。

以下の正規表現は、置換演算子s / pattern / replace / xsgで使用されます（つまり、$ 1 [$ 2]に置き換えます）。

あなたの投稿は、これができるかどうかを調べるためだけのものだと思います。本当に必要な場合は、コメント付きの正規表現を含めることができます。

$str = <<EOS;
testfunc();
testfunc2("test;test"); 
testfunc3("test';test");
testfunc4('test";test');
//testfunc5();
/* testfunc6(); */
/*
  testfunc7();
*/
/*
  //testfunc8();
*/
testfunc9("test\"test");
EOS

$str =~ s{
     ((?:(?:/\*[^*]*\*+(?:[^/*][^*]*\*+)*/|//(?:[^\\]|\\\n?)*?\n)|(?:"(?:\\.|[^"\\])*"|'(?:\\.|[^'\\])*'|.[^/"'\\;]*))*?)(;)
 }
 {$1\[$2\]}xsg;

print $str;

出力

testfunc()[;]
testfunc2("test;test")[;]
testfunc3("test';test")[;]
testfunc4('test";test')[;]
//testfunc5();
/* testfunc6(); */
/*
  testfunc7();
*/
/*
  //testfunc8();
*/
testfunc9("test"test")[;]

コメントで拡大

 (  ## Optional non-greedy, Capture group 1
   (?:
      ## Comments
        (?:
            /\*         ##  Start of /* ... */ comment
            [^*]*\*+    ##  Non-* followed by 1-or-more *'s
            (?:
                [^/*][^*]*\*+
            )*          ##  0-or-more things which don't start with /
                        ##    but do end with '*'
            /           ##  End of /* ... */ comment
          |  
            //          ## Start of // ... comment
            (?:
                [^\\]         ## Any Non-Continuation character ^\
              |               ##   OR
                \\\n?         ## Any Continuation character followed by 0-1 newline \n

             )*?            ## To be done 0-many times, stopping at the first end of comment

             \n         ##  End of // comment
        )

     | ##  OR,  various things which aren't comments, group 2:
        (?:
            " (?: \\. | [^"\\] )* "  ## Double quoted text
          |
            ' (?: \\. | [^'\\] )* '  ## Single quoted text
          |
            .           ##  Any other char
            [^/"'\\;]*  ##  Chars which doesn't start a comment, string, escape
        )               ##  or continuation (escape + newline) AND are NOT semi-colon ;
   )*?
 )
  ## Capture grou 2, the semi-colon
 (;)

score 1 · Accepted Answer

これはすべての例で機能しますが、適用するコードが例にどれだけ近いかによって異なります。

;(?!\S|(?:[^;]*\*/))

;-セミコロンに一致します

(?!-ネガティブ先読み-次のことを確認します->

\S-セミコロンの後に空白以外の文字はありません

|(?:[^;]*\*/))-空白文字がある場合は、次の文字まで記号;がないことを確認してください*/

問題が発生した場合はお知らせください。

正規表現を使用しても害がない場合はそれを使用したいが、後で再利用したい場合は、正規表現が最も信頼性の高いツールではないことが判明する可能性があります。

編集：

No. 5の修正-セミコロンは最初に一致するグループに含まれるようになります：

^(?:[^/]*)(;)(?!\S|(?:[^;]*\*/))

php - 正規表現はセミコロンと一致しますが、コメントや引用符では一致しません

2 に答える 2

Related

Reference