perl - 異なる行からの関連データの抽出、操作、およびマージに関する Perl の問題

Question

解決に苦労している非常に具体的な問題があります。それは、さまざまな行からの関連データの解析とマージに関連しています

以下に示す形式のテキストを含むファイルがあります。

======================================================
8:27:24 PM  http://10.11.12.13:80
======================================================
GET /dog-pictures HTTP/1.1
Host: 10.11.12.13
Language: english
Agent: Unknown
Connection: closed

======================================================



======================================================
8:28:56 PM  http://192.114.126.245:80
======================================================
GET /flowers HTTP/1.1
Host: 10.11.12.13
Language: english

======================================================



======================================================
8:29:07 PM  http://10.11.12.13:80
======================================================
GET /africas-animals HTTP/1.1
Host: 10.11.12.13
Language: english
Agent: Unknown
Connection: open

======================================================

上記のように、テキストファイル内の各データグループは 3 行の等号 (=======) で構成されていますが、その中に異なる行数のデータを含めることができます。

出力に必要な形式は次のとおりです。

    http://10.11.12.13/dog-pictures
    http://192.114.126.245/flowers
    http://10.11.12.13/africas-animals

マージする必要があるビットの説明:

======================================================
8:27:24 PM  http://10.11.12.13:80                     <--- Gets the first part from here**
======================================================
GET /dog-pictures HTTP/1.1                            <--- Gets the seconds part from here**
Host: 10.11.12.13
Language: english
Agent: Unknown
Connection: closed

======================================================

この問題についてご協力いただきありがとうございます。

score 1 · Accepted Answer

Perhaps the following will assist you:

use strict;
use warnings;

open my $fh, '<', 'data.txt' or die $!;

# Read a file line
while (<$fh>) {

    # If url captured on line beginning with time and read (separator) line
    if ( my ($url) = /^\d+:\d+:\d+.+?(\S+):\d+$/ and <$fh> ) {

        # Capture path
        my ($path) = <$fh> =~ /\s+(\/\S+)\s+/;

        print "$url$path\n" if $url and $path;
    }
}

Output:

http://10.11.12.13/dog-pictures
http://192.114.126.245/flowers
http://10.11.12.13/africas-animals

There are only two lines that contain the information you want, and those are separated by a line of equal signs. The first regex tries to match a time string and capture the url on that line. The and <$fh> is used to get past the separator. The second regex captures the path on the next line. Finally, the url and path are printed.

score 1 · Accepted Answer

でこれを試してPerlくださいshell：

perl -lane '
    if (/^\d+:\d+:\d+\s+\w+\s+([^:]+):/) {
        $scheme = $1;
    }
    if (/^(GET|HEAD|POST|PUT|DELETE|OPTION|TRACE)/) {
        $path = $F[1];
    }
    if (/^Host/) {
        print "$scheme://$F[1]$path";
    }
' file.txt

スクリプトバージョン

perl -MO=Deparse少し微調整して生成された...

#!/usr/bin/env perl
# mimic `-l` switch to print like "say"
BEGIN { $/ = "\n"; $\ = "\n"; }

use strict; use warnings;

my ($scheme, $path);

# magic diamond operator
while (<ARGV>) {
    chomp $_;
    # splitting current line in @F array
    my (@F) = split(' ', $_, 0);

    # regex to catch the scheme (http)
    if (/^\d+:\d+:\d+\s+\w+\s+([^:]+):/) {
        $scheme = $1;
    }
    # if the current line match an HTTP verb, we feed $path variable
    # with second column
    if (/^(GET|HEAD|POST|PUT|DELETE|OPTION|TRACE)/) {
        $path = $F[1];
    }
    # if the current line match HOST, we print the needed line
    if (/^Host/) {
        print "${scheme}://$F[1]$path";
    }
}

利用方法

chmod +x script.pl
./script.pl file.txt

出力

http://10.11.12.13/dog-pictures
http://10.11.12.13/flowers
http://10.11.12.13/africas-animals

score 0 · Accepted Answer

パール:

perl -F -lane 'if(/http/){$x=$F[2]}if(/GET/){print $x.$F[1]}' your_file

あなたがawkに行きたいなら：

awk '/http/{x=$3}/GET/{print x""substr($2,1)}' your_file

perl - 異なる行からの関連データの抽出、操作、およびマージに関する Perl の問題

3 に答える 3

スクリプトバージョン

利用方法

出力

Related

Reference