shell - 行の一部を比較するには?

Question

差分を取りたいファイルが 2 つあります。行にはタイムスタンプと、マッチングアルゴリズムで無視したいその他のものがありますが、マッチングアルゴリズムが残りのテキストに違いを見つけた場合は、それらの項目を出力したいと考えています。例えば：

1c1
<    [junit4] 2013-01-11 04:43:57,392 INFO  com.example.MyClass:123 [main] [loadOverridePropFile] Config file application.properties not found: java.io.FileNotFoundException: /path/to/application.properties (No such file or directory)
---
>    [junit4] 2013-01-11 22:16:07,398 INFO  com.example.MyClass:123 [main] [loadOverridePropFile] Config file application.properties not found: java.io.FileNotFoundException: /path/to/application.properties (No such file or directory)

発行すべきではありませんが:

1c1
<    [junit4] 2013-01-11 04:43:57,392 INFO  com.example.MyClass:123 [main] [loadOverridePropFile] Config file application.properties not found: java.io.FileNotFoundException: /path/to/application.properties (No such file or directory)
---
>    [junit4] 2013-01-11 22:16:07,398 INFO  com.example.MyClass:456 [main] [loadOverridePropFile] Config file application.properties not found: java.io.FileNotFoundException: /path/to/application.properties (No such file or directory)

発行する必要があります (行番号が異なるため)。タイムスタンプは引き続き発行されることに注意してください。

これはどのように行うことができますか？

score 3 · Accepted Answer

私は自分自身の前にこの機能を数回望んでいましたが、ここで再びポップアップしたので、少しグーグルで検索Algorithm::Diffすることにし、ハッシュ関数 (彼らはそれを「キー生成関数」と呼んでいます)特定の要素を一意に識別する文字列」であり、アルゴリズムが比較を行うために使用します (フィードする実際のコンテンツの代わりに)。

基本的に、必要なのは、文字列から不要なものを除外したい方法で正規表現マジックを実行するサブを追加し、サブリファレンスをパラメーターとして呼び出しに追加することですdiff()(以下のスニペットの myCHANGE 1とCHANGE 2コメントを参照してください)。 .

通常の (または統一された)diff出力が必要な場合は、モジュールに同梱されている精巧なdiffnew.pl例を確認し、このファイルで必要な変更を行ってください。デモンストレーションの目的で、diff.pl短くてここに完全に投稿できるので、同梱されているシンプルなものを使用します.

mydiff.pl

#!/usr/bin/perl

# based on diff.pl that ships with Algorithm::Diff
# demonstrates the use of a key generation function

# the original diff.pl is:
# Copyright 1998 M-J. Dominus. (mjd-perl-diff@plover.com)
# This program is free software; you can redistribute it and/or modify it
# under the same terms as Perl itself.

use Algorithm::Diff qw(diff);

die("Usage: $0 file1 file2") unless @ARGV == 2;

my ($file1, $file2) = @ARGV;

-f $file1 or die("$file1: not a regular file");
-f $file2 or die("$file2: not a regular file");
-T $file1 or die("$file1: binary file");
-T $file2 or die("$file2: binary file");

open (F1, $file1) or die("Couldn't open $file1: $!");
open (F2, $file2) or die("Couldn't open $file2: $!");
chomp(@f1 = <F1>);
close F1;
chomp(@f2 = <F2>);
close F2;

# CHANGE 1
# $diffs = diff(\@f1, \@f2);
$diffs = diff(\@f1, \@f2, \&keyfunc);

exit 0 unless @$diffs;

foreach $chunk (@$diffs)
{
        foreach $line (@$chunk)
        {
                my ($sign, $lineno, $text) = @$line;
                printf "%4d$sign %s\n", $lineno+1, $text;
        }
}
exit 1;

# CHANGE 2 {
sub keyfunc
{
        my $_ = shift;
        s/^(\d{2}:\d{2})\s+//;
        return $_;
}
# }

one.txt

12:15 one two three
13:21 three four five

two.txt

10:01 one two three
14:38 seven six eight

実行例

$ ./mydiff.pl one.txt two.txt
   2- 13:21 three four five
   2+ 13:21 seven six eight

実行例 2

diffそして、これは、に基づく通常の出力の1つですdiffnew.pl

$ ./my_diffnew.pl one.txt two.txt
2c2
< 13:21 three four five
---
> 13:21 seven six eight

ご覧のとおり、どちらのファイルの最初の行もタイムスタンプのみが異なり、ハッシュ関数によって比較のために削除されるため、無視されます。

ほら、あなたはあなた自身のコンテンツを認識しただけdiffです！

score 0 · Accepted Answer

ファイルが「a.txt」と「b.txt」であると仮定します。このように diff + cut を使用して取得できます。

diff <(cut -d" " -f4-99 a.txt) <(cut -d" " -f4-99 b.txt)

各カットは最初の 3 つのフィールド (日付とこの項目に関連するもの) を無視し、残りの行 (4 列目から 99 列目まで) のみを考慮します。カットは以下を使用して動作するはずです:

cut -d" " -f4- a.txt

しかし、それは私にはうまくいかないので、-f4-99 を追加しました。そのため、日付フィールドを無視するために両方の入力にカットを適用してから、必要に応じて比較するために diff を実行します。

shell - 行の一部を比較するには?

2 に答える 2

mydiff.pl

one.txt

two.txt

実行例

実行例 2

Related

Reference