bash - 変数とファイルのgrep - 実行時間

Question

興味深い観察を行いました-cURLステートメントの出力をテキストファイルに保存し、特定の文字列をgrepしていました。後で、代わりに出力を変数に格納するようにコードを変更しました。結局のところ、この変更により、スクリプトの実行が大幅に遅くなりました。I/O 操作はメモリ内操作よりもコストがかかると常に思っていたので、これは私にとっては直感に反していました。コードは次のとおりです。

#!/bin/bash
URL="http://m.cnbc.com"
while read line; do
  UA=$line
  curl -s --location --user-agent "$UA" $URL > RAW.txt
  #RAW=`curl --location --user-agent "$UA" $URL`
  L=`grep -c -e "Advertise With Us" RAW.txt`
  #L=`echo $RAW | grep -c -e "Advertise With Us"`
  M=`grep -c -e "id='menu'><button>Menu</button>" RAW.txt`
  #M=`echo $RAW | grep -c -e "id='menu'><button>Menu</button>"`
  D=`grep -c -e "Careers" RAW.txt`
  #D=`echo $RAW | grep -c -e "Careers"`
  if [[ ( $L == 1 && $M == 0 ) && ( $D == 0) ]]
    then
      AC="Legacy"
  elif [[ ( $L == 0 && $M == 1 ) && ( $D == 0) ]]
    then
  AC="Modern"
  elif [[ ( $L == 0 && $M == 0 ) && ( $D == 1) ]]
    then
      AC="Desktop"
  else
  AC="Unable to Determine"
  fi
  echo $AC >> Results.txt
done < UserAgents.txt

コメント行は、変数に格納する方法を表しています。なぜこれが起こっているのでしょうか？また、このスクリプトをさらに高速化する方法はありますか? 現在、2000 の入力エントリを処理するのに約 8 分かかります。

score 0 · Accepted Answer

との一致数を本当にカウントする必要がありますgrep -cか? 一致が見つかったかどうかを知る必要があるようです。その場合は、bash に組み込まれている文字列比較を簡単に使用できます。

また、ループの外で結果ファイルに書き込むと速くなります。

次のことを試してください。

#!/bin/bash
URL="http://m.cnbc.com"
while read line
do
  UA="$line"
  RAW=$(curl -s --location --user-agent "$UA" "$URL")
  [[ $RAW == *"Advertise With Us"* ]] && L=1 || L=0
  [[ $RAW == *"id='menu'><button>Menu</button>"* ]] && M=1 || M=0
  [[ $RAW == *Careers* ]] && D=1 || D=0

  if (( L==1 && M==0 && D==0 ))
  then
     AC="Legacy"
  elif (( L==1 && M==1 && D==0 ))
  then
     AC="Modern"
  elif (( L==1 && M==0 && D==1 ))
  then
     AC="Desktop"
  else
     AC="Unable to Determine"
  fi
  echo "$AC" 
done < UserAgents.txt > Results.txt

bash - 変数とファイルのgrep - 実行時間

2 に答える 2

Related

Reference