regex - ファイルで指定された整数を含む行を見つける方法は?

Question

dict各行に1つの整数を含むファイルがあります

123
456

の整数を正確fileに含むファイル内の行を見つけたいです。dict

私が使用する場合

$ grep -w -f dict file

次のような誤った一致が得られます

12345  foo
23456  bar

12345 != 123およびであるため、これらは偽です23456 != 456。問題は、この-wオプションが数字も単語の文字と見なすことです。行に他のテキストを含めることができるため、この-xオプションも機能しません。fileこれを行う最善の方法は何ですか？dictソリューションが進行状況の監視とfile大きなサイズでの優れたパフォーマンスを提供できれば、それは素晴らしいことです.

score 2 · Accepted Answer

以下のように単語境界を dict に追加します。

\<123\>
\<456\>

-w パラメータは不要です。ちょうど求めている：

grep -f 辞書ファイル

score 1 · Accepted Answer

これは、Pythonスクリプトを使用してかなり簡単に行うことができます。次に例を示します。

import sys

numbers = set(open(sys.argv[1]).read().split("\n"))
with open(sys.argv[2]) as inf:
    for s in inf:
        if s.split()[0] in numbers:
            sys.stdout.write(s)

エラーのチェックと回復は、リーダーが実装するために残されています。

score 1 · Accepted Answer

を使用したかなり一般的な方法awk：

awk 'FNR==NR { array[$1]++; next } { for (i=1; i<=NF; i++) if ($i in array) print $0 }' dict file

説明：

FNR==NR { }  ## FNR is number of records relative to the current input file. 
             ## NR is the total number of records.
             ## So this statement simply means `while we're reading the 1st file
             ## called dict; do ...`

array[$1]++; ## Add the first column ($1) to an array called `array`.
             ## I could use $0 (the whole line) here, but since you have said
             ## that there will only be one integer per line, I decided to use
             ## $1 (it strips leading and lagging whitespace; if any)

next         ## process the next line in `dict`

for (i=1; i<=NF; i++)  ## loop through each column in `file`

if ($i in array)       ## if one of these columns can be found in the array

print $0               ## print the whole line out

bash ループを使用して複数のファイルを処理するには:

## This will process files; like file, file1, file2, file3 ...
## And create output files like, file.out, file1.out, file2.out, file3.out ...

for j in file*; do awk -v FILE=$j.out 'FNR==NR { array[$1]++; next } { for (i=1; i<=NF; i++) if ($i in array) print $0 > FILE }' dict $j; done

tee複数のファイルで使用することに興味がある場合は、次のようなことを試してみてください。

for j in file*; do awk -v FILE=$j.out 'FNR==NR { array[$1]++; next } { for (i=1; i<=NF; i++) if ($i in array) { print $0 > FILE; print FILENAME, $0 } }' dict $j; done 2>&1 | tee output

これにより、処理中のファイルの名前と見つかった一致するレコードが表示され、というファイルに「ログ」が書き込まれoutputます。

regex - ファイルで指定された整数を含む行を見つける方法は?

3 に答える 3

Related

Reference