perl - 複数行の固定幅ファイルをperlで解析する方法は？

Question

次の形式で解析する必要のあるファイルがあります。（すべての区切り文字はスペースです）：

field name 1:            Multiple word value.
field name 2:            Multiple word value along
                         with multiple lines.
field name 3:            Another multiple word
                         and multiple line value.

私は単一行の固定幅ファイルを解析する方法に精通していますが、複数行を処理する方法に困惑しています。

score 8 · Accepted Answer

#!/usr/bin/env perl

use strict; use warnings;

my (%fields, $current_field);

while (my $line = <DATA>) {
    next unless $line =~ /\S/;

    if ($line =~ /^ \s+ ( \S .+ )/x) {
        if (defined $current_field) {
            $fields{ $current_field} .= $1;
        }
    }
    elsif ($line =~ /^(.+?) : \s+ (.+) \s+/x ) {
        $current_field = $1;
        $fields{ $current_field } = $2;
    }
}

use Data::Dumper;
print Dumper \%fields;

__DATA__
field name 1:            Multiple word value.
field name 2:            Multiple word value along
                         with multiple lines.
field name 3:            Another multiple word
                         and multiple line value.

score 4 · Accepted Answer

固定幅はunpack私に言います。正規表現で解析して分割することは可能ですがunpack、固定幅のデータに適したツールであるため、より安全な選択である必要があります。

最初のフィールドの幅を12に、その間の空きスペースを13に設定しました。これは、このデータで機能します。あなたはそれを変える必要があるかもしれません。テンプレート"A12A13A*"は、「12個、次に13個のASCII文字を検索し、その後に任意の長さのASCII文字を検索する」ことを意味します。unpackこれらの一致のリストを返します。また、文字列が指定されていない場合unpackに使用します。これは、ここで行うことです。$_

サンプルデータにあるように、最初のフィールドがコロンまで固定幅でない場合は、テンプレートのフィールド（たとえば、「A25A *」）をマージしてから、コロンを削除する必要があることに注意してください。

フィールド名が一意かどうかわからないため、ストレージデバイスとしてアレイを選択しました。ハッシュは同じ名前のフィールドを上書きします。配列のもう1つの利点は、ファイルに表示されるデータの順序が保持されることです。これらが無関係で、クイックルックアップが優先される場合は、代わりにハッシュを使用してください。

コード：

use strict;
use warnings;
use Data::Dumper;

my $last_text;
my @array;
while (<DATA>) {
    # unpack the fields and strip spaces
    my ($field, undef, $text) = unpack "A12A13A*";  
    if ($field) {   # If $field is empty, that means we have a multi-line value
            $field =~ s/:$//;             # strip the colon
        $last_text = [ $field, $text ];   # store data in anonymous array
        push @array, $last_text;          # and store that array in @array
    } else {        # multi-line values get added to the previous lines data
        $last_text->[1] .= " $text"; 
    }
}

print Dumper \@array;

__DATA__
field name 1:            Multiple word value.
field name 2:            Multiple word value along
                         with multiple lines.
field name 3:            Another multiple word
                         and multiple line value
                         with a third line

出力：

$VAR1 = [
          [
            'field name 1:',
            'Multiple word value.'
          ],
          [
            'field name 2:',
            'Multiple word value along with multiple lines.'
          ],
          [
            'field name 3:',
            'Another multiple word and multiple line value with a third line'
          ]
        ];

score 2 · Accepted Answer

あなたはこれを行うことができます：

#!/usr/bin/perl

use strict;
use warnings;

my @fields;
open(my $fh, "<", "multi.txt") or die "Unable to open file: $!\n";

for (<$fh>) {
    if (/^\s/) {
        $fields[$#fields] .= $_;    
    } else {
        push @fields, $_;
    }
}

close $fh;

行が空白で始まる場合は、の最後の要素に追加します@fields。それ以外の場合は、配列の最後にプッシュします。

または、ファイル全体を丸呑みして、見回しで分割します。

#!/usr/bin/perl

use strict;
use warnings;

$/=undef;

open(my $fh, "<", "multi.txt") or die "Unable to open file: $!\n";

my @fields = split/(?<=\n)(?!\s)/, <$fh>;

close $fh;

ただし、これは推奨されるアプローチではありません。

score 0 · Accepted Answer

区切り文字は次のように変更できます。

$/ = "\nfield name";

while (my $line = <FILE>) {

    if ($line =~ /(\d+)\s+(.+)/) {
        print "Record $1 is $2";
    }
}

perl - 複数行の固定幅ファイルをperlで解析する方法は？

4 に答える 4

Related

Reference