search - 複数の文字列の複数回出現の再帰的検索

Question

この質問は、前の質問の方向拡張です。私の検索要件は以下のとおりです

検索する必要のある複数の文字列は、values.txt ファイル (入力ファイル) に保存されます。このファイルには、たとえば次のような情報が含まれています。

string1  1
string2  3
string3  5

ここで、最初の列 (string1、string2、string3) は検索する必要がある文字列を示し、2 番目の列は検索する出現回数を示します。
さらに、特定のファイル拡張子 (.out、.txt など) を持つファイルに対して再帰的に検索を実行する必要があります。
検索出力は、検索の出力がファイル名とそのパスとともに出力されるファイルに送信する必要があります。

たとえば、一般的な出力は次のようになります (拡張子が .out のファイル名を再帰的に検索する場合)。

<path_of_searched_file1/fileName1.out>
The full line containing the <first> instance of <string1>
The full line containing the <first> instance of <string2>
The full line containing the <second> instance of <string2>
The full line containing the <third> instance of <string2>
The full line containing the <first> instance of <string3>
The full line containing the <second> instance of <string3>
The full line containing the <third> instance of <string3>
The full line containing the <fourth> instance of <string3>
The full line containing the <fifth> instance of <string3>


<path_of_searched_file2/fileName2.out>
The full line containing the <first> instance of <string1>
The full line containing the <first> instance of <string2>
The full line containing the <second> instance of <string2>
The full line containing the <third> instance of <string2>
The full line containing the <first> instance of <string3>
The full line containing the <second> instance of <string3>
The full line containing the <third> instance of <string3>
The full line containing the <fourth> instance of <string3>
The full line containing the <fifth> instance of <string3>


and so on

この検索の問題を解決するには、awk を使用するのが最善の方法ですか? もしそうなら、誰かが私の現在の検索要件を満たすために、この前の質問で提供されている awk コードを変更するのを手伝ってくれませんか?

score 1 · Accepted Answer

を使用する 1 つの方法を次にawk示します。YMMV。次のように実行します。

awk -f ./script.awk values.file $(find . -type f -regex ".*\.\(txt\|doc\|etc\)$")

の内容script.awk:

FNR==NR {
    a[$1]=$2;
    next
}

FNR==1 {
    for (i in a) {
        b[i]=a[i]
    }
}

{
    for (j in b) {
        if ($0 ~ j && b[j]-- > 0) {
            print > FILENAME ".out"
        }
    }
}

または、ここにワンライナーがあります：

awk 'FNR==NR { a[$1]=$2; next } FNR==1 { for (i in a) b[i]=a[i] } { for (j in b) if ($0 ~ j && b[j]-- > 0) print > FILENAME ".out" }' values.file $(find . -type f -regex ".*\.\(txt\|doc\)$")

説明：

最初のブロックで、の 1 列values.file目をキーとし、2 列目を値とする連想配列を作成します。2 番目と 3 番目のブロックは、コマンドを使用して見つかったファイルを読み取りますfind。最初のブロックで形成された配列は、見つかったファイルごとに複製されます ( を使用してこれを行う簡単な方法はないawkので、おそらく Perl とモジュールの方が適しているでしょうか?)。Find::File::Rule3 番目のブロックでは、文字列を検索してその値をデクリメントし、「.out」拡張子を付けてファイルの場所に出力する各キーをループループします。

search - 複数の文字列の複数回出現の再帰的検索

1 に答える 1

Related

Reference