bash - パターン A と B をテストし、パターン A と B の間およびパターン A と B を含むテキスト行を返します

Question

こんにちはスタックオーバーフロー！

2 つの異なるパターンに一致するすべてのレコードをエクスポートしようとしている非常に大きなフラットファイルがあります。問題は、各レコードの行数が異なり、レコードが互いにブリードすることです。レコードの最後の行は Door ID で、最初の行は User: です。

メールアドレスの @ に対してテストを行っていますが、最後のログインに「ログイン時間: 2013-08」が含まれています。メールアドレス行と最終ログイン行を含むすべての行をエクスポートする必要があります。以下は2つのサンプルです。私は awk を次のように使用してみました：

awk '/login time: 2013-08/{e=0}/@ /{gsub("^.*@ ","",$0);e=1}{if(e==1){print}}'  filename

そしてもちろん失敗…。

ここにサンプルデータがあります

User: afshin@runners.org
First Name: Afshi
Last Name: Noghami
Is Delegated Admin: False
IP Whitelisted: False
Account Suspended: False
Must Change Password: False
Unique ID: 102209840259208897543
ID TPYE: Cx4
Creation Time: 2013-06-07T04:14:42.000Z
Last login time: Never
Path: /Members/Inactive

IMs:
Addresses:
Organizations:
Phones:
Relations:
Door IDs:
User: jjnalli@runners.org
First Name: JISS
Last Name: NALLIKUZHY
Is a Super Admin: False
Is Delegated Admin: False
Has Agreed to Terms: True
IP Whitelisted: False
Account Suspended: False
Must Change Password: False
Unique ID: 109765147242431344122
ID TYPE: Cx4
Mailbox setup: True
Included: False
Creation Time: 2013-06-07T03:32:52.000Z
Last login time: 2013-08-02T07:13:02.000Z
Path: /Members/Inactive

IMs:
Addresses:
Organizations:
Phones:
Relations:
Door IDs:

最終ログイン日を持つ各レコードの望ましい出力は、次のようになります。

User: jjnalli@runners.org  
First Name: JISS  
Last Name: NALLIKUZHY  
Is a Super Admin: False  
Is Delegated Admin: False  
Has Agreed to Terms: True  
IP Whitelisted: False  
Account Suspended: False  
Must Change Password: False  
Unique ID: 109765147242431344122  
ID TYPE: Cx4  
Mailbox setup: True  
Included: False  
Creation Time: 2013-06-07T03:32:52.000Z  
Last login time: 2013-08-02T07:13:02.000Z

score 1 · Accepted Answer

`^User`からまでの行をグループ化し`Door ID`、一致する場合にのみ出力し`@.*login time: 20[0-9]`ます ...

私は最終的にあなたの必要性を理解したと思います:

これを試して：

sed -ne '/^Door ID/!H;/^User:/h;/^Door ID/{x;G;/@.*login time: 20[0-9]/p}' file

これはあなたの要求に一致します。

各パケットがマージされたら、2013-08に一致するすべてのエントリをドロップすることもできます:

sed -ne '/^Door ID/!H;/^User:/h;/^Door ID/{x;G;/@.*login time: 20[0-9]/{/login time: 2013-08/!p}}' file

score 1 · Accepted Answer

まず、各レコードをフィールドの配列に読み込みます。

BEGIN { FS = ": " }   # each line has fieldname and value

/^$/ { next }         # skip blank records

$1 == "User" {        # first field of new record
    delete fields     # delete current array
    fields[$1] = $2 } # store field value in array

$1 == "Door IDs" {    # last field of current record
    fields[$1] = $2   # store field value in array
    do_process() }    # process current record

$1 != "User" &&       # fields between first ...
$2 != "Door IDs" {    #             ... and last
    fields[$1] = $2 } # store field value in array

次に、レコードに対して必要なことは何でも行います。ここでは User フィールドと Last login time フィールドを表示していますが、必要な処理は何でも実行できます。

function do_process() {
    print fields["User"], fields["Last login time"] }

このコードをテストしていないことに注意してください...

編集:以下のコメントに基づいて変更されました。User フィールドは常に新しいレコードの始まりを示していると思います。レコードを読み取って保存するコードの改訂版を次に示します。

BEGIN { FS = ": "       # each line has fieldname and value
        first = 1 }     # flag for first record

/^$/ { next }           # skip blank records

$1 == "User" {          # first field of new record
    if (first > 1)      # no data the first time; skip
        do_process()    # process current record
    delete fields       # reset fields for new record
    fields[$1] = $2 }   # store field value in array

$1 == "Door IDs" {      # last field of current record
    fields[$1] = $2     # store field value in array
    do_process() }      # process current record

/./ { fields[$1] = $2 } # store field value in array

END { if (first > 1)    # last record not processed
        do_process() }  # process last record

その後、好きなようにデータを処理できます。

bash - パターン A と B をテストし、パターン A と B の間およびパターン A と B を含むテキスト行を返します

4 に答える 4

^Userからまでの行をグループ化しDoor ID、一致する場合にのみ出力し@.*login time: 20[0-9]ます ...

Related

Reference

`^User`からまでの行をグループ化し`Door ID`、一致する場合にのみ出力し`@.*login time: 20[0-9]`ます ...