regex - 一致する前に最初の文字を置き換える

Question

各行について、英数字記号の最初の一致の 1 文字前にセミコロンを追加する必要がありますが、セミコロンが最初に出現した後の英数字記号に対してのみ追加する必要があります。

例：

入力：

00000001;Root;;
00000002;  Documents;;
00000003;    oracle-advanced_plsql.zip;file;
00000004;  Public;;
00000005;  backup;;
00000006;    20110323-JM-F.7z.001;file;
00000007;    20110426-JM-F.7z.001;file;
00000008;    20110603-JM-F.7z.001;file;
00000009;    20110701-JM-F-via-summer_school;;
00000010;      20110701-JM-F-via-summer_school.7z.001;file;

望ましい出力:

00000001;;Root;;
00000002;  ;Documents;;
00000003;    ;oracle-advanced_plsql.zip;file;
00000004;  ;Public;;
00000005;  ;backup;;
00000006;    ;20110323-JM-F.7z.001;file;
00000007;    ;20110426-JM-F.7z.001;file;
00000008;    ;20110603-JM-F.7z.001;file;
00000009;    ;20110701-JM-F-via-summer_school;;
00000010;      ;20110701-JM-F-via-summer_school.7z.001;file;

誰かがそのためのPerl正規表現を作成するのを手伝ってくれませんか? ワンライナーとしてではなく、プログラムで必要です。

score 1 · Accepted Answer

まず第一に、これはあなたの要件に合っていると思われるプログラムです:

#/usr/bin/perl -w
while(<>) {                                                           
  s/^(.*?;.*?)(\w)/$1;$2/;                                            
  print $_;                                                           
}

ファイル「program.pl」に保存し、「chmod u+x program.pl」で実行可能にして、次のように入力データで実行します。

program.pl input-data.txt

正規表現の説明は次のとおりです。

s/        # start search-and-replace regexp
  ^       # start at the beginning of this line
  (       # save the matched characters until ')' in $1
    .*?;  # go forward until finding the first semicolon
    .*?   # go forward until finding... (to be continued below)
  )
  (       # save the matched characters until ')' in $2
    \w    # ... the next alphanumeric character.
  )
/         # continue with the replace part
  $1;$2   # write all characters found above, but insert a ; before $2
/         # finish the search-and-replace regexp.

サンプル入力に基づいて、より具体的な正規表現を使用します。

s/^(\d*; *)(\w)/$1;$2/;

この式は行の先頭から始まり、数字 (\d*) をスキップし、その後に最初のセミコロンとスペースが続きます。次の単語文字の前に、セミコロンを挿入します。

あなたのニーズに最適なものを取ってください！

score 0 · Accepted Answer

まず、本当に素晴らしい回答をありがとうございます。

実際、私のコードスニペットは次のようになります。

 our $seperator=";" # at the beginning of the file
 #...
 sub insert {
    my ( $seperator, $line, @all_lines, $count, @all_out );
    $count     = 0;
    @all_lines = read_file($filename);

    foreach $line (@all_lines) {
        $count = sprintf( "%08d", $count );
        chomp $line;
        $line =~ s/\:/$seperator/;                          # works
        $line =~ s/\ file/file/;                            # works

        #$line=~s/;\s*\K(?=\S)/;/;                          # doesn't work
        $line =~ s/^(.*?$seperator.*?)(\w)/$1$seperator$2/; # doesn't work
        say $count . $seperator . $line . $seperator; 

        $count++; # btw, is there maybe a hidden index variable in a foreach-loop I could us instead of a new variable??
        push( @all_out, $count . $seperator . $line . $seperator . "\n" );
    }

    write_file( $csvfile, @all_out ); # using File::Slurp
}

提示した入力を取得するために、foreach ループの冒頭でわかるように、すでにいくつかの小さな置換を行っています。

TLP と Yaakov によって提示された正規表現が私のコードで機能しない理由が知りたいです。一般的にそれらは動作しますが、Yaakov が与えた例のように書かれた場合のみ:

while(<>) {                                                           
  s/^(.*?;.*?)(\w)/$1;$2/;                                            
  print $_;                                                           
}

regex - 一致する前に最初の文字を置き換える

3 に答える 3

Related

Reference