php - 文字列のすべての出現に一致します

Question

私の検索テキストは次のとおりです。

...
...
var strings = ["aaa","bbb","ccc","ddd","eee"];
...
...

多くの行（実際にはjavascriptファイル）が含まれていますが、変数文字列の値を解析する必要があります。つまり、aaa、bbb、ccc、ddd、eeeです。

以下はPerlコードです、または下部でPHPを使用してください

my $str = <<STR;
    ...
    ...
    var strings = ["aaa","bbb","ccc","ddd","eee"];
    ...
    ...
STR

my @matches = $str =~ /(?:\"(.+?)\",?)/g;
print "@matches";

上記のスクリプトはすべてのインスタントに一致することはわかっていますが、他の行の文字列（ "xyz" ）も解析します。だから私は文字列をチェックする必要がありますvarstrings=

/var strings = \[(?:\"(.+?)\",?)/g

上記の正規表現を使用すると、aaaが解析されます。

/var strings = \[(?:\"(.+?)\",?)(?:\"(.+?)\",?)/g

上記を使用すると、aaa、およびbbbが取得されます。したがって、正規表現の繰り返しを避けるために、以下のように「+」数量詞を使用しました。

/var strings = \[(?:\"(.+?)\",?)+/g

しかし、私はeeeしか取得しなかったので、私の質問は、 「+」数量詞を使用したときにeeeのみを取得したのはなぜですか？

更新1：PHP preg_match_allを使用する（より多くの注意を引くためにそれを行う:-)）

$str = <<<STR
    ...
    ...
    var strings = ["aaa","bbb","ccc","ddd","eee"];
    ...
    ...
STR;

preg_match_all("/var strings = \[(?:\"(.+?)\",?)+/",$str,$matches);
print_r($matches);

アップデート2：なぜeeeと一致したのですか？の貪欲さのため(?:\"(.+?)\",?)+。貪欲を取り除くことにより、 /var strings = \[(?:\"(.+?)\",?)+?/ aaaは一致します。しかし、なぜ1つの結果しかないのでしょうか。単一の正規表現を使用してそれを実現する方法はありますか？

score 2 · Accepted Answer

最初に修飾子var strings = [を使用して文字列を検索するこのソリューションを好むかもしれません。/gこれは、次の正規表現\Gの直後に一致するように設定します。これ[は、コンマまたは空白が前にある可能性がある、二重引用符で囲まれた文字列の直後のすべての出現を検索します。

my @matches;

if ($str =~ /var \s+ strings \s* = \s* \[ /gx) {
  @matches = $str =~ /\G [,\s]* "([^"]+)" /gx;
}

修飾子を使用しているにもかかわらず、の 2 回目の出現がないため/g、正規表現/var strings = \[(?:\"(.+?)\",?)+/gは 1 回しか一致しませんvar strings = [。各一致は、一致が完了すると、キャプチャ変数、などの値のリストを返し、$1(二重引用符をエスケープする必要はありません) 複数の値をキャプチャして、そこに最終値のみを残します。上記のように、一致ごとに単一の値のみをキャプチャするように記述する必要があります。$2$3/(?:"(.+?)",?)+/$1$1

score 2 · Accepted Answer

単一の正規表現ソリューションは次のとおりです。

/(?:\bvar\s+strings\s*=\s*\[|\G,)\s*"([^"]*)"/g

\G前の一致が終了した位置 (または、最初の一致試行の場合は文字列の先頭) に一致するゼロ幅のアサーションです。したがって、これは次のように機能します。

var\s+strings\s*=\s*[\s*"([^"]*)"

...最初の試行では、次のようになります。

,\s*"([^"]*)"

...その後ですが、各試合は最後の試合が中断したところから正確に開始する必要があります。

これは PHPのデモですが、Perl でも動作します。

score 1 · Accepted Answer

+かっこ内の正確なものを(?:"(.+?)",?)1回以上繰り返すように指示しているためです。したがって、文字列と一致し、終了して、見つからない"eee"その文字列の繰り返しを探します。"eee"

use YAPE::Regex::Explain;
print YAPE::Regex::Explain->new(qr/var strings = \[(?:"(.+?)",?)+/)->explain();

The regular expression:

(?-imsx:var strings = \[(?:"(.+?)",?)+)

matches as follows:

NODE                     EXPLANATION
----------------------------------------------------------------------
(?-imsx:                 group, but do not capture (case-sensitive)
                         (with ^ and $ matching normally) (with . not
                         matching \n) (matching whitespace and #
                         normally):
----------------------------------------------------------------------
  var strings =            'var strings = '
----------------------------------------------------------------------
  \[                       '['
----------------------------------------------------------------------
  (?:                      group, but do not capture (1 or more times
                           (matching the most amount possible)):
----------------------------------------------------------------------
    "                        '"'
----------------------------------------------------------------------
    (                        group and capture to \1:
----------------------------------------------------------------------
      .+?                      any character except \n (1 or more
                               times (matching the least amount
                               possible))
----------------------------------------------------------------------
    )                        end of \1
----------------------------------------------------------------------
    "                        '"'
----------------------------------------------------------------------
    ,?                       ',' (optional (matching the most amount
                             possible))
----------------------------------------------------------------------
  )+                       end of grouping
----------------------------------------------------------------------
)                        end of grouping
----------------------------------------------------------------------

より簡単な例は次のとおりです。

my @m = ('abcd' =~ m/(\w)+/g);
print "@m";

印刷のみd。これは、に起因するものです：

use YAPE::Regex::Explain;
print YAPE::Regex::Explain->new(qr/(\w)+/)->explain();

The regular expression:

(?-imsx:(\w)+)

matches as follows:

NODE                     EXPLANATION
----------------------------------------------------------------------
(?-imsx:                 group, but do not capture (case-sensitive)
                         (with ^ and $ matching normally) (with . not
                         matching \n) (matching whitespace and #
                         normally):
----------------------------------------------------------------------
  (                        group and capture to \1 (1 or more times
                           (matching the most amount possible)):
----------------------------------------------------------------------
    \w                       word characters (a-z, A-Z, 0-9, _)
----------------------------------------------------------------------
  )+                       end of \1 (NOTE: because you are using a
                           quantifier on this capture, only the LAST
                           repetition of the captured pattern will be
                           stored in \1)
----------------------------------------------------------------------
)                        end of grouping
----------------------------------------------------------------------

キャプチャグループで数量詞を使用する場合、最後のインスタンスのみが使用されます。

動作する方法は次のとおりです。

my $str = <<STR;
    ...
    ...
    var strings = ["aaa","bbb","ccc","ddd","eee"];
    ...
    ...
STR

my @matches;
$str =~ m/var strings = \[(.+?)\]/; # get the array first
my $jsarray = $1;
@matches = $array =~ m/"(.+?)"/g; # and get the strings from that

print "@matches";

更新：単一行のソリューション（単一の正規表現ではありませんが）は次のようになります：

@matches = ($str =~ m/var strings = \[(.+?)\]/)[0] =~ m/"(.+?)"/g;

しかし、これは非常に読みにくい私見です。

php - 文字列のすべての出現に一致します

3 に答える 3

Related

Reference