unix - awk を使用して、ファイルからパターンを取得し、別のファイルの列と比較し、一致する行を出力します

Question

私は基本的に

grep -f

と

awk '{ if($2=="this is where I'd like to input a file of fixed string patterns") print $0}'

つまり、パターンの入力ファイル (ファイル 2) を使用して、ファイル (ファイル 1) の特定の列を検索したいと考えています。単純に一致が見つかった場合:

> outputfile.txt

以前の投稿から、この awk 行は非常に近いです。

awk 'NR==FNR{a[$0]=1;next} {n=0;for(i in a){if($0~i){n=1}}} n' file1 file2

ack または awk または grep よりも優れた方法を使用して、あるファイルのパターンを別のファイルから取得しますか?

ただし、ファイル 1 の特定の列は検索しません。他のツールも使用できます。

score 4 · Accepted Answer

あなたが見つけた例は、実際にあなたが望むものに非常に近いです.唯一の違いは、行全体を一致させたくないということです( $0).

次のように変更します。

awk 'NR==FNR { pats[$0]=1; next } { for(p in pats) if($2 ~ p) { print $0; break } }' patterns file

固定文字列の一致のみが必要な場合は、index()代わりに関数を使用$2 ~ pしてindex($2, p)ください。

列番号を awk の引数として指定することもできます。たとえば、次のようになります。

awk -v col=$col 'NR==FNR { pats[$0]=1; next } { for(p in pats) if($col ~ p) { print $0; break } }' patterns file

編集 - フィールド全体の一致

==これは、次の演算子で実現できます。

awk -v col=$col 'NR==FNR { pats[$0]=1; next } { for(p in pats) if($col == p) { print $0; break } }' patterns file

score 3 · Accepted Answer

This is using awk:

awk 'BEGIN { while(getline l < "patterns.txt") PATS[l] } $2 in PATS' file2

Where file1 is the file you are searching, and patterns.txt is a file with one exact pattern per file. The implicit {print} has been omitted but you can add it and do anything you like there.

The condition $2 in PATS will be true is the second column is exactly one of the patterns.

If patterns.txt are to be treated as regexp matches, modify it to

ok=0;{for (p in PATS) if ($2 ~ p) ok=1}; ok

So, for example, to test $2 against all the regexps in patterns.txt, and print the third column if the 2nd column matched:

awk 'BEGIN { while(getline l < "patterns.txt") PATS[l] } 
     ok=0;{for (p in PATS) if ($2 ~ p) ok=1}; ok 
    {print $3}' < file2

And here's a version in perl. Similar to the awk version except that it uses regexps instead of fields.

perl -ne 'BEGIN{open $pf, "<patterns.txt"; %P=map{chomp;$_=>1}<$pf>} 
   /^\s*([^\s]+)\s+([^\s]+).*$/ and exists $P{$2} and print' < file2

Taking that apart:

BEGIN{
  open $pf, "<patterns.txt"; 
  %P = map {chomp;$_=>1} <$pf>;
}

Reads in your patterns file into a has %P for fast lookup.

/^\s*([^\s]+)\s+([^\s]+).*$/ and  # extract your fields into $1, $2, etc
exists $P{$2} and                 # See if your field is in the patterns hash
print;                            # just print the line (you could also 
                                  # print anything else; print "$1\n"; etc)

It gets slightly shorter if your input file is tab-separated (and when you know that there's exactly one tab between fields). Here's an example that matches the patterns against the 5th column:

 perl -F"\t" -ane '
    BEGIN{open $pf, "<patterns.txt"; %P=map{chomp;$_=>1}<$pf>} 
    exists $P{$F[4]} and print ' file2

This is thanks to perl's -F operator that tells perl to auto-split into columns based on the separator (\t in this case). Note that since arrays in perl start from 0, $F[4] is the 5th field.

score 0 · Accepted Answer

このシナリオで列の区別がどの部分を果たしているのかよくわかりません。ある種のcsvファイルを処理しますか？正規表現リストファイルの列区切り文字を処理しますか？ファイル内に特定の区切り文字で区切られた明確な列がない場合は、次を使用できますgrep。

grep -o -f file2 file1

列が問題になる場合は、次のようになります。

grep -o "[^,]*" file1 | grep -f file2

,区切り文字はどこですか。

unix - awk を使用して、ファイルからパターンを取得し、別のファイルの列と比較し、一致する行を出力します

3 に答える 3

編集 - フィールド全体の一致

Related

Reference