perl - 複数の列とのファイル比較

Question

テスト環境で使用されていないファイルをチェックするために、ディレクトリのクリーンアップを行っています。テキストファイルと比較したい別のファイルでアルファベット順にソートされたすべてのファイル名のリストがあります。

最初のファイルの設定方法は次のとおりです。

test1.pl
test2.pl
test3.pl

これは、以下の他のファイルに基づいてクリーンアップするディレクトリ内のすべてのスクリプトの、行ごとに1つのスクリプト名です。

比較したいファイルは、各サーバーがテストとして実行するスクリプトをリストしたタブファイルであり、明らかに多くの重複があります。このファイルからテストスクリプト名を取り除き、それを別のファイルと比較して使用uniqし、このファイルを上記と比較して、使用されていないテストスクリプトを確認したいとsort思います。diff

ファイルは次のように設定されています。

server: : test1.pl test2.pl test3.pl test4.sh test5.sh

少ない行と多い行があります。私の最初の衝動はPerl、行を分割し、値がない場合はリスト内の値をプッシュするスクリプトを作成することでしたが、それは完全に非効率的です。私は経験しawkたことはありませんが、それを行うには複数の方法があると思いました。これらのファイルを比較する他のアイデアはありますか？

score 2 · Accepted Answer

これにより、ファイル名が2番目のファイルの1行に1つになるように再配置されawk、次にdiff最初のファイルで出力されます。

diff file1 <(awk '{ for (i=3; i<=NF; i++) print $i }' file2 | sort -u)

score 2 · Accepted Answer

%neededサーバーによって使用されているファイルのハッシュを作成し、すべてのファイル名を含むファイルと照合するPerlソリューション。

#!/usr/bin/perl
use strict;
use warnings;
use Inline::Files;

my %needed;
while (<SERVTEST>) {
    chomp;
    my (undef, @files) = split /\t/;
    @needed{ @files } = (1) x @files;
}

while (<TESTFILES>) {
    chomp;
    if (not $needed{$_}) {
        print "Not needed: $_\n";   
    }
}

__TESTFILES__
test1.pl
test2.pl
test3.pl
test4.pl
test5.pl
__SERVTEST__
server1::   test1.pl    test3.pl
server2::   test2.pl    test3.pl
__END__
*** prints

C:\Old_Data\perlp>perl t7.pl
Not needed: test4.pl
Not needed: test5.pl

score 1 · Accepted Answer

仕事をするための速くて汚いスクリプト。良さそうな場合は、openを使用して、適切なエラーチェックでファイルを読み取ります。

use strict;
use warnings;
my @server_lines = `cat server_file`;chomp(@server_lines);
my @test_file_lines = `cat test_file_lines`;chomp(@test_file_lines);
foreach my $server_line (@server_lines){
   $server_line =~ s!server: : !!is;
   my @files_to_check = split(/\s+/is, $server_line);
   foreach my $file_to_check (@files_to_check){
      my @found = grep { /$file_to_check/ } @test_file_lines;
      if (scalar(@found)==0){
        print "$file_to_check is not found in $server_line\n";
      }
   }

}

score 1 · Accepted Answer

私があなたの必要性を正しく理解しているなら、あなたはテストのリスト（testfiles.txt）を含むファイルを持っています：

test1.pl
test2.pl 
test3.pl
test4.pl
test5.pl

そして、サーバーのリストを含むファイルと、それらすべてがテストするファイル（serverlist.txt）：

server1:        :       test1.pl        test3.pl
server2:        :       test2.pl        test3.pl

（ここでは、すべてのスペースをタブと想定しています）。

2番目のファイルをテスト済みファイルのリストに変換する場合は、これを使用diffして元のファイルと比較できます。

cut -d: -f3 serverlist.txt | sed -e 's/^\t//g' | tr '\t' '\n' | sort -u > tested_files.txt

はcutサーバー名と'：'をsed削除し、残された先頭のタブを削除しtr、残りのタブを改行に変換します。次に、一意の並べ替えを実行して重複を並べ替えて削除します。これはに出力されtested_files.txtます。

その後、あなたがするのはdiff testfiles.txt tested_files.txt。

score 0 · Accepted Answer

期待した出力を投稿しなかったのでわかりにくいですが、これはあなたが探しているものですか？

$ cat file1
test1.pl
test2.pl
test3.pl
$
$ cat file2
server: : test1.pl test2.pl test3.pl test4.sh test5.sh
$
$ gawk -v RS='[[:space:]]+' 'NR==FNR{f[$0]++;next} FNR>2 && !f[$0]' file1 file2
test4.sh
test5.sh

perl - 複数の列とのファイル比較

5 に答える 5

Related

Reference