bash - 固定数のフィールドを使用した awk 解析

Question

awk レコードで解析するソリューションを探しています。ここで、/n文字も使用できます。レコードはで区切られてい|ます。問題は、特定の数のフィールドに達したときに新しい行を実行できるかどうかを判断することです。これはawkでどのように行うことができますか?

例：

2013-03-24 15:49:40.575175 EST|aaa|tsi|p1753|th2056569632|172.30.10.212|56809|2013-03-24 15:49:32 AFT|10354453|con2326|cmd7|seg-1||dx318412|x10354453|sx1|LOG: |00000|statement: SET DATESTYLE = "ISO"; Select * 
from bb 
where cc='1'||||||SET DATESTYLE = "ISO"; Select * from bb where cc='1'|0||postgres.c|1447|
2013-04-10 12:45:48.277080 EST|aa|tsi|p22814|th1093698336|172.30.0.186|3304|2013-04-10 12:44:29 AFT|10400046|con67|cmd5|seg-1||dx341|x10400046|sx1|LOG: |00000|statement: create table xx as (select r.xx,sum(r."XX"),c.dd from region_RR r, cat_CC c
where r.aa=c.vv
group by 1)||||||create table xx as (select r.xx,sum(r."XX"),c.dd from region_RR r, cat_CC c
where r.aa=c.vv
group by 1)
|0||postgres.c|1447|

は、多くの\n文字を含む 1 つのレコードです。そして、awkで解析して、たとえば5番目のフィールドを取得する必要があります。

score 3 · Accepted Answer

上記の sudo_O の回答からインスピレーションを得て... 変数 FIELD_TO_PRINT を対象のフィールド位置に設定し、別の変数 FIELDS_PER_RECORD をレコードを表すフィールドの数に設定します。GNU awkUbuntuでテスト済み

awk   -v FIELDS_PER_RECORD=10 -v FIELD_TO_PRINT=5 'BEGIN{FS="|"; RS="\0"}\
{for (i=1; i<=NF; ++i) {if (i % FIELDS_PER_RECORD == FIELD_TO_PRINT) {print $i} }}' file_name.txt
th2056569632
x10354453
SET DATESTYLE = "ISO"; Select * from bb where cc='1'

score 1 · Accepted Answer

RS='\0'ファイル内の 1 つのレコードについては、入力ファイルが 1 つのレコード全体として読み取られるように、レコードセパレータを null 文字に設定することはできません。

$ awk '{print $5}' FS='|' RS='\0' file
th2056569632

date複数のレコードの場合、をレコード区切り記号として使用できます(それらが空白行で区切られていない限り、物事が簡単になるか、出力でこのフィールドが必要でない限り) :

$ awk 'NR>1{print $5}' FS='|' RS='(^|[^|])[0-9]{4}-[0-9]{2}-[0-9]{2} ' file
th2056569632
th1093698336

ここではもっと単純なgrep -o 'th[0-9]*' file ものが適していますか？

score 1 · Accepted Answer

明らかに、あなたが求めたものではありません:比較のために、Pythonでそれを行う方法を次に示します:

from cStringIO import StringIO

def records_from_file(f,separator='|',field_count=30):
  record = []
  for line in f:
    fields = line.split(separator)
    if len(record) > 0:
      # Merge last of existing with first of new
      record[-1] += fields[0]
      # Extend rest of fields
      record.extend(fields[1:])
    else:
      record.extend(fields)
    if len(record) > field_count:
      raise Exception("Concatenating records overflowed number of fields",record)
    elif len(record) == field_count:
      yield record
      record = []

sample = """2013-03-24 15:49:40.575175 EST|aaa|tsi|p1753|th2056569632|172.30.10.212|56809|2013-03-24 15:49:32 AFT|10354453|con2326|cmd7|seg-1||dx318412|x10354453|sx1|LOG: |00000|statement: SET DATESTYLE = "ISO"; Select * 
from bb 
where cc='1'||||||SET DATESTYLE = "ISO"; Select * from bb where cc='1'|0||postgres.c|1447|
2013-04-10 12:45:48.277080 EST|aa|tsi|p22814|th1093698336|172.30.0.186|3304|2013-04-10 12:44:29 AFT|10400046|con67|cmd5|seg-1||dx341|x10400046|sx1|LOG: |00000|statement: create table xx as (select r.xx,sum(r."XX"),c.dd from region_RR r, cat_CC c
where r.aa=c.vv
group by 1)||||||create table xx as (select r.xx,sum(r."XX"),c.dd from region_RR r, cat_CC c
where r.aa=c.vv
group by 1)
|0||postgres.c|1447|"""

for record in records_from_file(StringIO(sample)):
  print record[4]

収量:

th2056569632
th1093698336

bash - 固定数のフィールドを使用した awk 解析

3 に答える 3

Related

Reference