bash - ログファイルを解析する最適な方法

Question

次のようなログファイルがあります。

Client connected with ID 8127641241
< multiple lines of unimportant log here>
Client not responding
Total duration: 154.23583
Sent: 14
Received: 9732
Client lost

Client connected with ID 2521598735
< multiple lines of unimportant log here>
Client not responding
Total duration: 12.33792
Sent: 2874
Received: 1244
Client lost

Client connected with ID 1234ログには、で始まりで終わるこれらのブロックが多数含まれていますClient lost。それらが混同されることはありません (一度に 1 つのクライアントのみ)。

このファイルを解析して、次のような統計を生成するにはどうすればよいですか。

ここに画像の説明を入力

私は主に、フォーマットではなく、解析プロセスについて尋ねています。

Client connectedすべての行をループして、行を見つけたときにフラグを設定し、ID を変数に保存できると思います。次に、行をgrepし、行が見つかるまで値を保存しますClient lost。これは良いアプローチですか？より良いものはありますか？

score 3 · Accepted Answer

を使用した簡単な方法は次のawkとおりです。

awk 'BEGIN { print "ID Duration Sent Received" } /^(Client connected|Total duration:|Sent:)/ { printf "%s ", $NF } /^Received:/ { print $NF }' file | column -t

結果：

ID          Duration   Sent  Received
8127641241  154.23583  14    9732
2521598735  12.33792   2874  1244

score 2 · Accepted Answer

awk:

awk 'BEGIN{print "ID Duration Sent Received"}/with ID/&&!f{f=1}f&&/Client lost/{print a[1],a[2],a[3],a[4];f=0}f{for(i=1;i<=NF;i++){
        if($i=="ID")a[1]=$(i+1)
        if($i=="duration:")a[2]=$(i+1)
        if($i=="Sent:")a[3]=$(i+1)
        if($i=="Received:")a[4]=$(i+1)
}}'log

データブロック間に常に空の行がある場合、上記の awk スクリプトは次のように簡略化できます。

 awk -vRS="" 'BEGIN{print "ID Duration Sent Received"}
{for(i=1;i<=NF;i++){
        if($i=="ID")a[1]=$(i+1)
        if($i=="duration:")a[2]=$(i+1)
        if($i=="Sent:")a[3]=$(i+1)
        if($i=="Received:")a[4]=$(i+1)
}print a[1],a[2],a[3],a[4];}' log

出力：

ID Duration Sent Received
8127641241 154.23583 14 9732
2521598735 12.33792 2874 1244

より良いフォーマットを取得したい場合は、出力をパイプして|column -t

あなたが得る：

ID          Duration   Sent  Received
8127641241  154.23583  14    9732
2521598735  12.33792   2874  1244

score 2 · Accepted Answer

ソリューションperl

#!/usr/bin/perl

use warnings;
use strict;

print "\tID\tDuration\tSent\tReceived\n";

while (<>) {
  chomp;
  if (/Client connected with ID (\d+)/) {
    print "$1\t";
  }
  if (/Total duration: ([\d\.]+)/) {
    print "$1\t";
  }
  if (/Sent: (\d+)/) {
    print "$1\t";
  }
  if (/Received: (\d+)/) {
    print "$1\n";
  }
}

出力例:

        ID  Duration    Sent    Received
8127641241  154.23583   14  9732
2521598735  12.33792    2874    1244

score 2 · Accepted Answer

ログファイルにエラーがないことが確実で、フィールドが常に同じ順序である場合は、次のようなものを使用できます。

#!/bin/bash

ids=()
declare -a duration
declare -a sent
declare -a received
while read _ _ _ _ id; do
   ids+=( "$id" )
   read _ _ duration[$id]
   read _ sent[$id]
   read _ received[$id]
done < <(grep '\(^Client connected with ID\|^Total duration:\|^Sent:\|Received:\)' logfile)

# printing the data out, for control purposes only
for id in "${ids[@]}"; do
   printf "ID=%s\n\tDuration=%s\n\tSent=%s\n\tReceived=%s\n" "$id" "${duration[$id]}" "${sent[$id]}" "${received[$id]}"
done

出力は次のとおりです。

$ ./parsefile
ID=8127641241
    Duration=154.23583
    Sent=14
    Received=9732
ID=2521598735
    Duration=12.33792
    Sent=2874
    Received=1244

ただし、データは対応する連想配列に格納されます。かなり効率的です。別のプログラミング言語 (perl など) ではもう少し効率的かもしれませんが、投稿にbash、、のみのタグを付けたので、質問に完全に答えたsedとgrep思います。

説明:grep関心のある行のみをフィルタリングし、bash は関心のあるフィールドのみを読み取ります。これらは常に同じ順序であると想定しています。スクリプトは理解しやすく、必要に応じて変更できる必要があります。

score 1 · Accepted Answer

段落モードを使用してファイルを丸呑みする

Perl または AWK を使用すると、レコード間の空白行をセパレータとして使用する特別な段落モードを使用して、レコードを丸呑みすることができます。Perl では、-00段落モードを使用するために使用します。AWK では、RS変数を空の文字列 (例: "") に設定して、同じことを行います。次に、各レコード内のフィールドを解析できます。

行指向のステートメントを使用する

または、シェルの while ループを使用して一度に各行を読み取り、次に grep または sed を使用して各行を解析することもできます。解析の複雑さによっては、case ステートメントを使用できる場合もあります。

たとえば、レコードに常に 5 つの一致するフィールドがあると仮定すると、次のようにすることができます。

while read; do
    grep -Eo '[[:digit:]]+'
done < /tmp/foo | xargs -n5 | sed 's/ /\t/g'

ループは次のようになります。

23583   14  9732    2521598735  33792
2874    1244    8127641241  23583   14
9732    2521598735  33792   2874    1244

確かにフォーマットをいじったり、ヘッダー行を追加したりできます。ポイントは、自分のデータを知る必要があるということです。

AWK、Perl、または Ruby でさえ、レコード指向の形式を解析するためのより良いオプションですが、必要が基本的なものであれば、シェルは確かにオプションです。

score 0 · Accepted Answer

awk -v RS= -F'\n' '
BEGIN{ printf "%15s%15s%15s%15s\n","ID","Duration","Sent","Received" }
{
   for (i=1;i<=NF;i++) {
      n = split($i,f,/ /)    
      if ( $i ~ /^(Client connected|Total duration:|Sent:|Received:)/ ) {
         printf "%15s",f[n]
      }
   }
   print ""
}'

score 0 · Accepted Answer

Perl の短いスニペット:

perl -ne '
    BEGIN {print "ID Duration Sent Received\n";}
    print "$1 " if /(?:ID|duration:|Sent:|Received:) (.+)$/;
    print "\n" if /^Client lost/;
' filename | column -t

bash - ログ ファイルを解析する最適な方法

7 に答える 7

段落モードを使用してファイルを丸呑みする

行指向のステートメントを使用する

Related

Reference

bash - ログファイルを解析する最適な方法