bash - これをより高速に実行できますか (ファイルの読み取り、[sed] の置換、新しいファイルの書き込み)

Question

このコードを bash スクリプトで使用して、複数の 16 進文字列を含むファイルを読み取り、何らかの置換を行ってから、新しいファイルに書き込みます。約 300Mb の場合、約 30 分かかります。
これがより速くできるかどうか疑問に思っていますか？

sed 's,[0-9A-Z]\{2\},\\\\x&,g' ${in_file} | while read line; do
 printf "%b" ${line} >> ${out_file}
 printf '\000\000' >> ${out_file}
done

アップデート：

いくつかのテストを行ったところ、次の結果が得られました。

勝者は：

sed 's,[0-9A-Z]\{2\},\\\\x&,g' ${in_file} | while read line; do
    printf "%b" ${line} >> ${out_file}
    printf '\000\000' >> ${out_file}
done

実 44m27.021s
ユーザー 29m17.640s
sys 15m1.070s

sed 's,[0-9A-Z]\{2\},\\\\x&,g' ${in_file} | while read line; do
    printf '%b\000\000' ${line} 
done >> ${out_file}

実 18m50.288s
ユーザー 8m46.400s
sys 10m10.170s

export LANG=C
sed 's/$/0000/' ${in_file} | xxd -r -ps >> ${out_file}

実 0m31.528s
ユーザー 0m1.850s
システム 0m29.450s

score 4 · Accepted Answer

Vim に付属の xxd コマンドが必要です。

export LANG=C
sed 's/$/0000/' ${in_file} | xxd -r -ps > ${out_file}

score 3 · Accepted Answer

これは、bash のループが原因で遅くなります。sed/awk/perl/etc にループを実行させることができれば、はるかに高速になります。ただし、sedまたはawkでそれを行う方法がわかりません。おそらくperlにとってはかなり簡単ですが、私はそれに答えるのに十分なperlを知りません.

少なくとも、必要なものをリファクタリングすることで、少し時間を節約できるはずです。

sed 's,[0-9A-Z]\{2\},\\\\x&,g' ${in_file} | while read line; do
 printf '%b\000\000' ${line} 
done >> ${out_file}

少なくともこの方法では、反復ごとに 1 回 printf を実行し、${out_file} を 1 回だけ開いたり閉じたりします。

score 2 · Accepted Answer

完全なプログラミング言語に切り替えますか？これがRubyのワンライナーです。

ruby -ne 'print "#{$_.chomp.gsub(/[0-9A-F]{2}/) { |s| s.to_i(16).chr }}\x00\x00"'

score 0 · Accepted Answer

Python を使用していて、データが単純であると想定している場合

$ cat file
99
AB

脚本：

o=open("outfile","w")
for line in open("file"):
    s=chr(int(line.rstrip(),16))+chr(000)+chr(000)
    o.write(s)
o.close()

4 に答える 4