sed - 行を awk と while read 行で比較する

Question

17k 行のファイルと 4k 行のファイルの 2 つのファイルがあります。位置 115 と位置 125 を 2 番目のファイルの各行と比較し、一致する場合は、最初のファイルの行全体を新しいファイルに書き込みます。「cat $filename |」を使用してファイルを読み取るソリューションを思いつきました。LINE'を読みながら。ただし、完了するまでに約8分かかります。「awk」を使用してこの処理時間を短縮する方法は他にありますか。

私のコード

cat $filename | while read LINE
do
  #read 115 to 125 and then remove trailing spaces and leading zeroes
  vid=`echo "$LINE" | cut -c 115-125 | sed 's,^ *,,; s, *$,,' | sed 's/^[0]*//'`
  exist=0
  #match vid with entire line in id.txt
  exist=`grep -x "$vid" $file_dir/id.txt | wc -l`
  if [[ $exist -gt 0 ]]; then
    echo "$LINE" >> $dest_dir/id.txt
  fi
done

score 2 · Accepted Answer

これはどのように：

FNR==NR {                      # FNR == NR is only true in the first file

    s = substr($0,115,10)      # Store the section of the line interested in 
    sub(/^\s*/,"",s)           # Remove any leading whitespace
    sub(/\s*$/,"",s)           # Remove any trailing whitespace

    lines[s]=$0                # Create array of lines
    next                       # Get next line in first file
}
{                              # Now in second file
    for(i in lines)            # For each line in the array
        if (i~$0) {            # If matches the current line in second file 
            print lines[i]     # Print the matching line from file1
            next               # Get next line in second file
        }
}

スクリプトに保存して、次のscript.awkように実行します。

$ awk -f script.awk "$filename" "${file_dir}/id.txt" > "${dest_dir}/id.txt"

2番目のファイルの各行について、最初のファイルの一意の行の約50％を調べる必要があるため、これはまだ遅くなります（ほとんどの行が実際に一致すると仮定します） . 2 番目のファイルの行が部分文字列と完全に一致することを確認できれば、これは大幅に改善される可能性があります。

フルラインマッチの場合、これはより高速になるはずです:

FNR==NR {                      # FNR == NR is only true in the first file

    s = substr($0,115,10)      # Store the section of the line interested in 
    sub(/^\s*/,"",s)           # Remove any leading whitespace
    sub(/\s*$/,"",s)           # Remove any trailing whitespace

    lines[s]=$0                # Create array of lines
    next                       # Get next line in first file
}
($0 in lines) {                  # Now in second file
    print lines[$0]     # Print the matching line from file1
}

sed - 行を awk と while read 行で比較する

1 に答える 1

Related

Reference