perl - Perl file parser for dynamic file

Question

I'm new with Perl and could really use some help making a file parser. The file is built up like this (X is a number that changes from file to file and provides the number of following lines that contains a column heading):

X,1,0,0,2,0,0,2,0,1,2,0,2,2,0,3,2,0,4,2,1,0,2,2,0,2,3,0,2,4,0,2,4,1,2,4,2,2,4,3,2,5,0,2,5,1,2,5,2,2,5,3,3,1,0,3
# Col_heading1
# Col_heading2
# Col_heading3 //Continues X rows
# Col_headingX 
# 2013 138 22:42:21 - Random text
# 2013 138 22:42:22 : Random text
# 2013 138 22:42:23 : Random text
2013 138 22:42:26, 10, 10, 10, 20, //continues X values
2013 138 22:42:27, 10, 10, 10, 20, 
2013 138 22:42:28, 10, 10, 10, 20, 
# 2013 138 22:42:31 - Random text
# 2013 138 22:42:32 : Random text
# 2013 138 22:42:33 - Event $eventname starting ($eventid) //$eventname and $eventid changes for each file
2013 138 22:42:35, 10, 10, 10, 20, 
2013 138 22:42:36, 10, 10, 10, 20, 
2013 138 22:42:37, 10, 10, 10, 20, 
2013 138 22:42:38, 10, 10, 10, 20, 
2013 138 22:42:39, 10, 10, 10, 20, 
# 2013 138 22:42:40 : Random text
2013 138 22:42:41, 10, 10, 10, 20, 
2013 138 22:42:42, 10, 10, 10, 20, 
# 2013 138 22:42:45 - Event $eventname ended ($eventid) //$eventname and $eventid changes for each file
2013 138 22:42:46, 10, 10, 10, 20, 
2013 138 22:42:47, 10, 10, 10, 20, 
# 2013 138 22:42:48 : Random text

The parser needs to transpose Col_headings to tab separated values on one line, and list all lines between # 2013 138 22:42:33 - Event $eventname starting ($eventid) and # 2013 138 22:42:45 - Event $eventname ended ($eventid) that does not start with a #. The values must also be changed from comma separated to tab separated.

The output file should then look like:

Filename:/home/..../filename    What:$eventname Where:SYSTEM    ID:$eventid
Time                Col_heading1    Col_heading2    Col_heading3    Col_headingX
2013 138 22:42:35   10              10              10              20
2013 138 22:42:36   10              10              10              20
2013 138 22:42:37   10              10              10              20
2013 138 22:42:38   10              10              10              20
2013 138 22:42:39   10              10              10              20 
2013 138 22:42:41   10              10              10              20 
2013 138 22:42:42   10              10              10              20

Any help with this would be very much appreciated!

score 1 · Accepted Answer

ファイルを開くと、最初の行から次のように番号を取得できます。

my ($heading_count) = split /,/, <$fh>;

次に、ループして見出しを取得します。

my @headings = qw(Time);
for (1..$heading_count) {
    chomp(my $heading = <$fh>); # Chomp to remove the newline
    # Process it somehow, e.g. remove leading # + whitespace
    $heading =~ s/^#\s+//;
    push @headings, $heading;
}

それが完了したら、ファイルの残りの部分をループし、開始/終了パターン間の行を解析して出力します。始めるためのかなり単純な例を次に示します。

print join "\t", @headings, "\n"; # print out the headings
my $in_event = 0; # State variable to track if we're in an event
while(<DATA>) {
    if (/Event (.*) starting \((.*)\)/) { # Watch for the event starting, event name is now in $1, event id in $2
        $in_event = 1;
        next;
    }
    next unless $in_event; # Skip if not in an event yet
    last if /Event .* ended/; # Stop reading if the event ends
    next if /^#/; # Skip comments

    s/,\s?/\t/g; # Replace commas with tabs
    print; # Print the row
}

このアプローチを使用すると、可変長のために列見出しがデータと適切に整列しないことがわかります。そのため、必要なものを正確に取得するために微調整するかText::CSV、行の解析を調べる必要があります (またはを使用しますsplit) 。Text::Table適切なテーブルを作成するようなものです。

perl - Perl file parser for dynamic file

1 に答える 1

Related

Reference