regex - perl での正規表現マッチングに関する奇妙な問題、代替試行の一致

Question

次の perl スクリプトを検討してください。

 #!/usr/bin/perl

 my $str = 'not-found=1,total-found=63,ignored=2';

 print "1. matched using regex\n" if ($str =~ m/total-found=(\d+)/g);
 print "2. matched using regex\n" if ($str =~ m/total-found=(\d+)/g);
 print "3. matched using regex\n" if ($str =~ m/total-found=(\d+)/g);
 print "4. matched using regex\n" if ($str =~ m/total-found=(\d+)/g);

 print "Bye!\n";

これを実行した後の出力は次のとおりです。

1. matched using regex
3. matched using regex
Bye!

同じ正規表現は 1 回一致し、直後には一致しません。同じ文字列を同じ正規表現と一致させようとする代替試行が perl で失敗する理由は何ですか?

ありがとう！

score 5 · Accepted Answer

これが、コードが機能しない理由の長い説明です。

修飾子は、正規表現の/g動作を「グローバルマッチング」に変更します。これは、文字列内のすべてのパターンに一致します。ただし、このマッチングがどのように行われるかはcontextによって異なります。Perl の 2 つの (主な) コンテキストは、リストコンテキスト(複数形) とスカラーコンテキスト(単数形) です。

list contextでは、グローバル正規表現一致は、一致したすべての部分文字列のリスト、または一致したすべてのキャプチャのフラットリストを返します。

my $_ = "foobaa";
my $regex = qr/[aeiou]/;

my @matches = /$regex/g; # match all vowels
say "@matches"; # "o o a a"

スカラーコンテキストでは、一致は正規表現が一致したかどうかを説明する perl ブール値を返すようです:

my $match = /$regex/g;
say $match; # "1" (on failure: the empty string)

ただし、正規表現はiteratorに変わりました。正規表現の一致が実行されるたびに、正規表現は文字列内の現在の位置から開始され、一致を試みます。一致する場合は true を返します。マッチングに失敗した場合、

一致は false を返し、
文字列内の現在の位置が開始に設定されます。

文字列内の位置がリセットされたため、次の一致は再び成功します。

my $match;
say $match while $match = /$regex/g;
say "The match returned false, or the while loop would have go on forever";
say "But we can match again" if /$regex/g;

2 番目の効果 - 位置のリセット - は、追加の/cフラグでキャンセルできます。

pos文字列内の位置は関数でアクセスできます:pos($string)のように設定できる現在の位置を返しますpos($string) = 0。

正規表現は、文字列の先頭に正規表現を固定するの\Gと同じように、現在の位置にアサーションを固定することもできます。^

このm//gcスタイルのマッチングにより、トークナイザーを簡単に作成できます。

my @tokens;
my $_ = "1, abc, 2 ";
TOKEN: while(pos($_) < length($_)) {
  /\G\s+/gc and next; # skip whitespace
  # if one of the following matches fails, the next token is tried
  if    (/\G(\d+)/gc) { push @tokens, [NUM => $1]}
  elsif (/\G,/gc    ) { push @tokens, ['COMMA'  ]}
  elsif (/\G(\w+)/gc) { push @tokens, [STR => $1]}
  else { last TOKEN } # break the loop only if nothing matched at this position.
}
say "[@$_]" for @tokens;

出力：

[NUM 1]
[COMMA]
[STR abc]
[COMMA]
[NUM 2]

score 3 · Accepted Answer

を取り除きm、g正規表現の修飾子として、彼らはあなたが望むことをしていません。

print "1. matched using regex\n" if ($str =~ /total-found=(\d+)/);
print "2. matched using regex\n" if ($str =~ /total-found=(\d+)/);
print "3. matched using regex\n" if ($str =~ /total-found=(\d+)/);
print "4. matched using regex\n" if ($str =~ /total-found=(\d+)/);

具体的にmは、このコンテキストでは is optionalm/foo/はとまったく同じ/foo/です。本当の問題はg、このコンテキストで望まないことをたくさん行うことです。詳細はperlretutを参照してください。

regex - perl での正規表現マッチングに関する奇妙な問題、代替試行の一致

3 に答える 3

Related

Reference