regex - Perl で大文字と小文字が混在している文字列を見つけるにはどうすればよいですか?

Question

大文字と小文字が混在する文字列定数を含むファイルを探して、何千ものファイルをフィルタリングしようとしています。このような文字列は空白に埋め込むことができますが、空白自体を含めることはできません。したがって、次の (UC 文字を含む) が一致します。

"  AString "   // leading and trailing spaces together allowed
"AString "     // trailing spaces allowed
"  AString"    // leading spaces allowed
"newString03"  // numeric chars allowed
"!stringBIG?"  // non-alphanumeric chars allowed
"R"            // Single UC is a match

しかし、これらはそうではありません:

"A String" // not a match because it contains an embedded space
"Foo bar baz" // does not match due to multiple whitespace interruptions
"a_string" // not a match because there are no UC chars

両方のパターンを含む行で一致させたい:

"ABigString", "a sentence fragment" // need to catch so I find the first case...

Perl regexps を使用したいのですが、できればackコマンドラインツールによって駆動されます。明らかに、\wと\Wは機能しません。\Sは、スペース以外の文字と一致する必要があるようです。「文字列ごとに少なくとも1つの大文字」の要件を埋め込む方法を理解できないようです...

ack --match '\"\s*\S+\s*\"'

私が得た最も近いものです。\S+を、「少なくとも 1 つの大文字 (ASCII) 文字 (非空白文字列の任意の位置)」の要件を満たすものに置き換える必要があります。

これは C/C++ でプログラムするのは簡単です (そうです、正規表現に頼らずに手続き的に Perl を使用します)。

score 7 · Accepted Answer

次のパターンは、すべてのテストに合格します。

qr/
  "      # leading single quote

  (?!    # filter out strings with internal spaces
     [^"]*   # zero or more non-quotes
     [^"\s]  # neither a quote nor whitespace
     \s+     # internal whitespace
     [^"\s]  # another non-quote, non-whitespace character
  )

  [^"]*  # zero or more non-quote characters
  [A-Z]  # at least one uppercase letter
  [^"]*  # followed by zero or more non-quotes
  "      # and finally the trailing quote
/x

このテストプログラム（空白やコメントなし/xで、したがって空白やコメントなしで上記のパターンを使用）をack-grep（ackUbuntuで呼び出されるように）入力として使用する

#! /usr/bin/perl

my @tests = (
  [ q<"  AString ">   => 1 ],
  [ q<"AString ">     => 1 ],
  [ q<"  AString">    => 1 ],
  [ q<"newString03">  => 1 ],
  [ q<"!stringBIG?">  => 1 ],
  [ q<"R">            => 1 ],
  [ q<"A String">     => 0 ],
  [ q<"a_string">     => 0 ],
  [ q<"ABigString", "a sentence fragment"> => 1 ],
  [ q<"  a String  "> => 0 ],
  [ q<"Foo bar baz">  => 0 ],
);

my $pattern = qr/"(?![^"]*[^"\s]\s+[^"\s])[^"]*[A-Z][^"]*"/;
for (@tests) {
  my($str,$expectMatch) = @$_;
  my $matched = $str =~ /$pattern/;
  print +($matched xor $expectMatch) ? "FAIL" : "PASS",
        ": $str\n";
}

次の出力を生成します。

$ ack-grep '"(?![^"]*[^"\s]\s+[^"\s])[^"]*[A-Z][^"]*"' try
  [ q<"  AString ">   => 1 ],
  [ q<"AString ">     => 1 ],
  [ q<"  AString">    => 1 ],
  [ q<"newString03">  => 1 ],
  [ q<"!stringBIG?">  => 1 ],
  [ q<"R">            => 1 ],
  [ q<"ABigString", "a sentence fragment"> => 1 ],
my $pattern = qr/"(?![^"]*[^"\s]\s+[^"\s])[^"]*[A-Z][^"]*"/;
  print +($matched xor $expectMatch) ? "FAIL" : "PASS",

Cシェルと派生物を使用すると、強打を回避する必要があります。

% ack-grep '"(?\![^"]*[^"\s]\s+[^"\s])[^"]*[A-Z][^"]*"' ...

ハイライトされた一致を保持できればいいのですが、それは許可されていないようです。

エスケープされた二重引用符（\"）は、このパターンを大幅に混乱させることに注意してください。

score 0 · Accepted Answer

次のような文字クラスを使用して要件を追加できます。

ack --match "\"\s*\S+[A-Z]\S+\s*\""

ack一度に1行ずつ一致すると思います。\S+\s*\"パーツは、連続する複数の終了引用符に一致できます。"alfa""だけではなく、全体に一致します"alfa"。

regex - Perl で大文字と小文字が混在している文字列を見つけるにはどうすればよいですか?

2 に答える 2

Related

Reference