This is using awk
:
awk 'BEGIN { while(getline l < "patterns.txt") PATS[l] } $2 in PATS' file2
Where file1
is the file you are searching, and patterns.txt
is a file with one exact pattern per file. The implicit {print}
has been omitted but you can add it and do anything you like there.
The condition $2 in PATS
will be true is the second column is exactly one of the patterns.
If patterns.txt
are to be treated as regexp matches, modify it to
ok=0;{for (p in PATS) if ($2 ~ p) ok=1}; ok
So, for example, to test $2
against all the regexps in patterns.txt
, and print the
third column if the 2nd column matched:
awk 'BEGIN { while(getline l < "patterns.txt") PATS[l] }
ok=0;{for (p in PATS) if ($2 ~ p) ok=1}; ok
{print $3}' < file2
And here's a version in perl
. Similar to the awk
version except that it
uses regexps instead of fields.
perl -ne 'BEGIN{open $pf, "<patterns.txt"; %P=map{chomp;$_=>1}<$pf>}
/^\s*([^\s]+)\s+([^\s]+).*$/ and exists $P{$2} and print' < file2
Taking that apart:
BEGIN{
open $pf, "<patterns.txt";
%P = map {chomp;$_=>1} <$pf>;
}
Reads in your patterns file into a has %P
for fast lookup.
/^\s*([^\s]+)\s+([^\s]+).*$/ and # extract your fields into $1, $2, etc
exists $P{$2} and # See if your field is in the patterns hash
print; # just print the line (you could also
# print anything else; print "$1\n"; etc)
It gets slightly shorter if your input file is tab-separated (and when you know that
there's exactly one tab between fields). Here's an example that matches the patterns
against the 5th column:
perl -F"\t" -ane '
BEGIN{open $pf, "<patterns.txt"; %P=map{chomp;$_=>1}<$pf>}
exists $P{$F[4]} and print ' file2
This is thanks to perl's -F
operator that tells perl to auto-split into columns
based on the separator (\t
in this case).
Note that since arrays in perl
start from 0
, $F[4]
is the 5th field.