perl - 複数行のパターンを抽出するためのPerlワンライナー

Question

複数の行にまたがることができる/できない次のようなパターンがファイルにあります：

 abcd25
 ef_gh
 ( fg*_h
 hj_b*
 hj ) {

私が試したこと：

perl -nle'print while m / ^ \ s *（\ w +）\ s +（\ w +？）\ s *（（[\ w-0-9、* \ s] ））\ s {/ gm'

ここでフラグが何を意味するのかわかりませんが、パターンのaを記述し、regexそれをパターンスペースに挿入するだけでした。これは、パターンが次のように1行にある場合によく一致します。

abcd25 ef_gh ( fg*_h hj_b* hj ) {

しかし、複数行の場合にのみ失敗します！

私は昨日perlから始めましたが、構文があまりにも混乱しています。それで、私たちの仲間のSOメイトの一人が提案したように、私はそれを書き、regex彼から提供されたコードに挿入しました。

perlこの場合、僧侶が私を助けてくれることを願っています。代替ソリューションは大歓迎です。

入力ファイル：

 abcd25
 ef_gh
 ( fg*_h
 hj_b*
 hj ) {

 abcd25
 ef_gh
 fg*_h
 hj_b*
 hj ) {

 jhijdsiokdù ()lmolmlxjk;
 abcd25 ef_gh ( fg*_h hj_b* hj ) {

期待される出力：

 abcd25
 ef_gh
 ( fg*_h
 hj_b*
 hj ) {
 abcd25 ef_gh ( fg*_h hj_b* hj ) {

入力ファイルには、必要なパターンの開始パターンと終了パターンと一致する複数のパターンを含めることができます。返信ありがとうございます。

score 9 · Accepted Answer

正規表現は1行でも一致しません。二重括弧は何をしていると思いますか？

あなたはおそらく欲しかった

m/^\s*(\w+)\s+(\w+?)\s*\([\w0-9,*\s]+\)\s{/gm

更新：仕様が変更されました。正規表現には（ほとんど）ありませんが、コードを少し変更する必要があります。

perl -0777 -nle 'print "$1\n" while m/^\s*(\w+\s+\w+?\s*\([\w0-9,*\s]+\)\s{)/gm'

別の更新：

説明：

スイッチは次のように説明されていperlrunます：zero、n、l、e

正規表現は、 YAPE :: Regex::Explainによって自動説明できます。

perl -MYAPE::Regex::Explain -e 'print YAPE::Regex::Explain->new(qr/^\s*(\w+\s+\w+?\s*\([\w0-9,*\s]+\)\s{)/)->explain'
The regular expression:

(?-imsx:^\s*(\w+\s+\w+?\s*\([\w0-9,*\s]+\)\s{))

matches as follows:

NODE                     EXPLANATION
----------------------------------------------------------------------
(?-imsx:                 group, but do not capture (case-sensitive)
                         (with ^ and $ matching normally) (with . not
                         matching \n) (matching whitespace and #
                         normally):
----------------------------------------------------------------------
  ^                        the beginning of the string
----------------------------------------------------------------------
  \s*                      whitespace (\n, \r, \t, \f, and " ") (0 or
                           more times (matching the most amount
                           possible))
----------------------------------------------------------------------
  (                        group and capture to \1:
----------------------------------------------------------------------
    \w+                      word characters (a-z, A-Z, 0-9, _) (1 or
                             more times (matching the most amount
                             possible))
----------------------------------------------------------------------
    \s+                      whitespace (\n, \r, \t, \f, and " ") (1
                             or more times (matching the most amount
                             possible))
----------------------------------------------------------------------
    \w+?                     word characters (a-z, A-Z, 0-9, _) (1 or
                             more times (matching the least amount
                             possible))
----------------------------------------------------------------------
    \s*                      whitespace (\n, \r, \t, \f, and " ") (0
                             or more times (matching the most amount
                             possible))
----------------------------------------------------------------------
    \(                       '('
----------------------------------------------------------------------
    [\w0-9,*\s]+             any character of: word characters (a-z,
                             A-Z, 0-9, _), '0' to '9', ',', '*',
                             whitespace (\n, \r, \t, \f, and " ") (1
                             or more times (matching the most amount
                             possible))
----------------------------------------------------------------------
    \)                       ')'
----------------------------------------------------------------------
    \s                       whitespace (\n, \r, \t, \f, and " ")
----------------------------------------------------------------------
    {                        '{'
----------------------------------------------------------------------
  )                        end of \1
----------------------------------------------------------------------
)                        end of grouping
----------------------------------------------------------------------

/gmスイッチはperlreで説明されています

score 9 · Accepted Answer

ワンライナーにはフリップフロップ演算子を使用する

Perlは、フリップフロップ演算子を使用してこれを非常に簡単にします。これにより、2つの正規表現の間のすべての行を出力できます。例えば：

$ perl -ne 'print if /^abcd25/ ... /\bhj \) {/' /tmp/foo
abcd25
ef_gh
( fg*_h
hj_b*
hj ) {

ただし、このような単純なワンライナーでは、区切りパターン間の特定の一致を拒否する一致を区別できません。それには、より複雑なアプローチが必要です。

より複雑な比較は、条件分岐の恩恵を受けます

ワンライナーが常に最良の選択であるとは限りません。正規表現が複雑になりすぎると、すぐに手に負えなくなる可能性があります。このような状況では、過度に巧妙な正規表現の一致を使用するよりも、条件分岐を使用できる実際のプログラムを作成する方が適切です。

これを行う1つの方法は、単純なパターンで一致を構築し、他の単純なパターンと一致しない一致を拒否することです。例えば：

#!/usr/bin/perl -nw

# Use flip-flop operator to select matches.
if (/^abcd25/ ... /\bhj \) {/) {
    push @string, $_
};

# Reject multi-line patterns that don't include a particular expression
# between flip-flop delimiters. For example, "( fg" will match, while
# "^fg" won't.
if (/\bhj \) {/) {
    $string = join("", @string);
    undef @string;
    push(@matches, $string) if $string =~ /\( fg/;
};

END {print @matches}

OPの更新されたコーパスに対して実行すると、次のように正しく生成されます。

abcd25
ef_gh
( fg*_h
hj_b*
hj ) {
abcd25 ef_gh ( fg*_h hj_b* hj ) {

perl - 複数行のパターンを抽出するためのPerlワンライナー

2 に答える 2

ワンライナーにはフリップフロップ演算子を使用する

より複雑な比較は、条件分岐の恩恵を受けます

Related

Reference