regex - replace the 'value' of the 'key' when 'key' is found in file

Question

I have a problem in replacing the value of some strings in one file into the same string found in another file.

test1.txt

porsche  430+turbo blue
ferrari  520+supercharged red
buggati  1001+supersport black
fiat     turbo white

Some strings in this test1.txt have their own values as given in test2.txt

test2.txt
turbo=30
supercharged=50 
supersport=100

I want to replace the values from test2.txt in test1.txt whenever i find the corresponding string.

output.txt
    porsche  460 blue
    ferrari  570 red
    buggati  1101 black
    fiat     30 white

Turbo from test2.txt is searched in test1.txt and then the value is replaced and summed up.

I did lot of googling and i didnt find the way to proceed with this. Can anyone please help me out with this thanks in advance

score 2 · Accepted Answer

amonのソリューションは素晴らしくエレガントです。+1 (彼の回答にこのコメントを残すのに十分な評価ポイントがありません)

オペレーターディスパッチテーブルの拡張性が必要ないことがわかった場合は、かなり簡単な代替手段を次に示します。

(2012/08/29 編集: +、-、*、/、^ 演算子の処理のために amon のディスパッチテーブルを組み込みました)

use strict;
use warnings;
use English qw( -no_match_vars );

setup_for_testing( );

open my $source_file, '<',  'text1.txt' || die "Couldn't open source file: $OS_ERROR";
open my $key_file,    '<',  'text2.txt' || die "Couldn't open key file: $OS_ERROR";

# Clear the output file
open my $output_file, '>',  'output.txt' || die "Couldn't open output file: $OS_ERROR";

# Now open output file for append
open $output_file,    '>>', 'output.txt' || die "Couldn't open output file: $OS_ERROR";

# Create translation map from text2.txt
my %translation_map = translation_map( $key_file );

# Process text1.txt and print to output.txt
while ( my $source_line = <$source_file> ) {
    my $transformed_line = transform( $source_line, \%translation_map );
    print $output_file $transformed_line ||
        die "Couldn't print to output file: $OS_ERROR";;
}

# Tidy up
close $source_file || die "Couldn't close source file: $OS_ERROR";
close $key_file    || die "Couldn't close key file: $OS_ERROR";
close $output_file || die "Couldn't close output file: $OS_ERROR";

###################
sub setup_for_testing {
   open my $textfile1, '>',  'text1.txt' || die "Couldn't open source file: $OS_ERROR";
   open my $textfile2, '>',  'text2.txt' || die "Couldn't open key file: $OS_ERROR";

   my $source_text =<<'END_TEXT';
porsche  430-turbo blue
ferrari  520*supercharged red
buggati  1001+supersport black
fiat     turbo white
END_TEXT

   my $key_file_text =<<'END_TEXT';
turbo=30
supercharged=50
supersport=100
END_TEXT

   print $textfile1 $source_text   || die "Couldn't print to text1.txt: $OS_ERROR";
   print $textfile2 $key_file_text || die "Couldn't print to text2.txt: $OS_ERROR";

   close $textfile1 || die "Couldn't close source file: $OS_ERROR";
   close $textfile2 || die "Couldn't close key file: $OS_ERROR";

   return; # intentional void return
}

sub translation_map {
    my $key_file = shift;

    my %translation_map;
    while ( my $key_mapping = <$key_file> ) {
        chomp $key_mapping;

        # The regex /x option allows whitespace in the regular expression for readability
        my ( $key, $value ) = split / \s* = \s* /x, $key_mapping;
        $translation_map{ $key } = $value;
    }

    return %translation_map;
}

sub transform {
    my $source_line = shift @_;
    my %value_for   = %{ shift @_ };

    my $transformed_line = $source_line;

    foreach my $key ( keys %value_for ) {
        # The regex /e option causes the rights side of a substitution to be evaluated as
        # a Perl expression.
        my $value = $value_for{ $key };
        my ( $before_expression, $lvalue, $operator, $rvalue_key, $after_expression ) =
            ( $transformed_line =~ m/ \A
                                      ( .*? )
                                      ( \d+ ) ([-+*\/^]?) ( $key )
                                      ( .* )
                                      \Z
                                    /x );

        if ( $operator  ) {
            my $rvalue = $value_for{ $rvalue_key };

            # Using the dispatch table from amon's answer
            my $value_of_expression = {
              '+' => sub {$_[0] +  $_[1]},
              '-' => sub {$_[0] -  $_[1]},
              '*' => sub {$_[0] *  $_[1]},
              '/' => sub {$_[0] /  $_[1]},
              '^' => sub {$_[0] ** $_[1]},
            }->{$operator}->($lvalue, $rvalue);

            $transformed_line =
                $before_expression . $value_of_expression . $after_expression . "\n";
        } else {
            $transformed_line =~ s/$key/$value/;
        }
    }

    return $transformed_line;
}

このスクリプトは、質問の仕様に従ってテストファイル text1.txt および text2.txt を作成し、変換を行って output.txt に出力します。

> ls
stackoverflow-12169648_replace_value_of_key.pl

> perl stackoverflow-12169648_replace_value_of_key.pl 

> ls
output.txt                  text1.txt
stackoverflow-12169648_replace_value_of_key.pl  text2.txt

> more text1.txt 
porsche  430+turbo blue
ferrari  520+supercharged red
buggati  1001+supersport black
fiat     turbo white

> more text2.txt 
turbo=30
supercharged=50
supersport=100

> more output.txt 
porsche  460 blue
ferrari  570 red
buggati  1101 black
fiat     30 white

これが役に立つことを願っています。

.デビッド

score 1 · Accepted Answer

最初のファイル（ Aと呼ばれる）には、1つ以上の空白文字で区切られた3つの列があると想定します。2番目の列には、基本（中置）演算子で区切られた10進数と変数を含む算術式が含まれる場合があります。変数値は別のファイルに固定されており、以降Bと呼ばれます。

変数の準備は簡単です。

my %variables = map {chomp; split /=/, $_, 2} do {
  open my $file, "<", $filename_B or die;
  <$file>;
};

他のファイルの解析はより困難です。ファイルハンドルで開かれていると仮定して$fileA、行をループし、各行を3つのフィールドに分割します。

while (defined(my $line = <$fileA>)) {
   chomp $line;
   my ($model, $expression, $color) = split /\s+/, $line, 3;
   my $value = parseExpression($expression);
   print "\t$model $value $color\n"; # use printf to prettyprint if needed
}

次に、STDOUTに出力することを想定して、式の値を他のデータと一緒に出力します。

subparseExpressionは、演算子で式の文字列を分割します。変数名が置き換えられます。次に、操作は厳密に右連想的に実行されます。これにより解析が容易になりますが、これは必ずしも自然なことではありません。3*4+1に評価され15ます。複数の操作を解決できるようにするために、反復よりも再帰を使用するため、再帰を使用します。

sub parseExpression {
  my ($string) = @_;
  my ($part, $operator, $rest) = ($string =~ /(\w+)([-+*\/^]?)(.*$)/g);
  if (not $operator) {
    # $part is the whole expression
    my $value = exists $variables{$part} ? $variables{$part} : $part;
    die if $value =~ /[a-z]/i; # The variable name was not substituted
    return $value;
  } else {
    my $rval = parseExpression($rest);
    my $lval = parseExpression($part); # you don't need this
                                       # if there are no variables on the left
    my $value = {
      '+' => sub {$_[0] +  $_[1]},
      '-' => sub {$_[0] -  $_[1]},
      '*' => sub {$_[0] *  $_[1]},
      '/' => sub {$_[0] /  $_[1]},
      '^' => sub {$_[0] ** $_[1]},
    }->{$operator}->($lval, $rval);
    return $value;
  }
}

かわいい小さなディスパッチテーブルを使用して、各オペレーターに適切な計算を実行します。追加の演算子をサポートするために、いつでも演算子の正規表現とテーブルを拡張できます。

現在の実装では、変数名として数値が許可されていることに注意してください。あなたが望むかもしれないものではありませんが、それは人生を楽にします。

未定義の値がランダムに発生するという興味深い問題があるかもしれませんが、このコードは正しい方向へのポインターを提供するはずです。（2番目の列で1つの操作のみを許可する場合は、再帰を削除できます）

score 1 · Accepted Answer

これは、2 つの手順で交換を実行するだけで簡単に実行できます。最初に%values、ファイルから派生したハッシュに存在するすべてのキーワードを見つけtest2.txtます。次に、算術演算子で接続された複数の 10 進数を探し、式を評価して置換を形成します。

joinハッシュキーを見つけるための正規表現は、正規表現代替演算子を使用して接続するために動的に構築されます|。

2 番目の正規表現は次のようになります。

expression ::= digits, operator, digits, { operator, digits }

用語間の空白を許可します

use strict;
use warnings;

my %values = do {
  open my $fh, '<', 'test2.txt' or die $!;
  local $/;
  <$fh> =~ /\w+/g;
};

my $regex = join '|', keys %values;

open my $fh, '<', 'test1.txt' or die $!;

while (<$fh>) {
  s/\b($regex)\b/$values{$1}/g;
  s|([0-9]+(\s*[-+*/]\s*[0-9]+)+)|$1|eeg;
  print;
}

出力

porsche  460 blue
ferrari  570 red
buggati  1101 black
fiat     30 white

score 0 · Accepted Answer

私は perl でコーディングすることはまったくありませんが、全体的なヒントを提供できると思いました...

まず、これをシンプルに保つために、制限されたパターン形式にある程度コミットする必要があります。そうしないと、コメントで提案されているように、これは言語パーサーになります。したがって、test1.txt の行にパターンが含まれている場合と含まれていない場合があると言わざるを得ない場合があります[numbers][operator][characters]。

したがって、test2.txt を解析して連想配列 (ハッシュ) にすることをお勧めします。これにより、次のようになります。

{
    "turbo"        =>  30,
    "supercharged" =>  50,
    "supersport"   => 100,
}

次に、test1.txt の各行に対して、次のようなパターンで正規表現一致を実行できます。
\b((\d+)([+])(\w+))\b

+このパターンから、演算子とキーを確認できます。それらを取得した場合は、ハッシュでキーを検索し、そのキーに対して評価を実行できます。parsed number, operator, hash[key] => value

のみを処理したい場合は+、数値をに変換してint追加するだけです。それ以外の場合、複数の演算子パターンをサポートするには、それらを明示的に処理するか、文字列を安全に評価する必要があります。

その最初のプライマリキャプチャグループで正規表現置換できます

regex - replace the 'value' of the 'key' when 'key' is found in file

4 に答える 4

Related

Reference