regex - Perl キャプチャの位置を見つける方法

Question

次のようなスペース区切りのファイルがあります。

 First        Second        Third       Forth
 It               is        possible    to   
 do             this                    task
 with          regex        but         i
 don't          know        how         to

私の仕事は、各行のすべての単語をキャプチャし、それらからハッシュを作成することです。

しかし、ここに私の問題があります。フィールドはどの列でも空である可能性があります（たとえば、3行目、3番目のフィールド）。

各行の単語は、先頭または末尾の列の名前によって整列されます。(列の名前は、最初の行の単語です。例First Second Third Forth)

私の例では、単語は列の左 (または列名の先頭) にFirst Third Forth配置され、右 (または列名の末尾) に配置されます。Second

各行のハッシュを使用して、次のような形式の出力を作成する必要があります。

$hash{First} has Second-property $hash{Second}. It also has $hash{Third} and $hash{Forth}.

use File::Basename;
use locale;
open my $file, "<", $ARGV[0];
open my $file2,">>",fileparse($ARGV[0])."2.txt";
my @alls = <$file>;

sub Main{
my $first = shift @alls;
my $poses = First_And_Last($first);
my $curr_poses;
my $curr_hash;
#do{OutputLine($_->[0],$_->[1],$first)}for (@$poses);
my $result_array=[];
my @keys = qw(# Variable Type Len Format Informat Label);
for $word(@alls){
    $curr_poses=First_And_Last($word);
    undef ($curr_hash);
    $curr_hash = Take_Words($poses, $word, $curr_poses);
    push @{$result_array},$curr_hash; #AoH  
    }

#end of main
}

sub First_And_Last{
    #First_And_Last($str)
    my $str = shift;    
    my $begin;
    my $end;
    my $ref=[];
    while ($str=~m/(([\S\.]\s?)+\b|#)/g){       
        $begin = pos($str) - length($1);
        $end = pos($str);       
        push @{$ref},[$begin,$end];
        }               
    return $ref;
    }

sub Take_Words{
    #Take_Words($poses, $line,$current) 
    my $outref = {};
    my $ref = shift; #take the ref of offsets of words
    my $line = shift;# and the next line in file
    my $current = shift; # and this is the poses of current line
    my @keys = qw(# Variable Type Len Format Informat Label);
    do{$outref->{$_}=undef;}for(@keys);
    my $ethalon; #for $ref
    my $relativity; #for $current
    my $key; #for key in $outref
    my @ethalon = @{$ref};

    $ethalon = shift @ethalon;
    $relativity = shift @{$current};
    $key = shift @keys;

    while (defined($key) && defined($relativity)){
        if ($ethalon->[0] == $relativity->[0] || $ethalon->[1] == $relativity->[1]){    
                $outref->{$key} = substr($line, $relativity->[0],$relativity->[1] - $relativity->[0]);          

                $relativity = shift @{$current};
            }
            $ethalon = shift @ethalon;
            $key = shift @keys;         
        }


    return $outref;
    }

score 2 · Accepted Answer

これが私のアルゴリズムですが、ややCっぽいです：

各列見出しの開始位置を決定して保存します。
各列: 見出しの開始位置に移動します。
2 つの連続するスペースを通過するまで、左にステップします。
2 文字右に移動し、位置を覚えます。
2 つの連続したスペースを通過するまで右に移動します。
左に 2 文字移動し、位置を覚えます。
見つかった境界の間のすべてを抽出します。
先頭と末尾の空白を削除します。
ハッシュに保存する
手順 2 から繰り返します

次に、その実装について確認する必要があります。

ステップ1：

my @starting;
{
  my @char = split m{}, <$file>; # split the first line into char array
  my $spacecount = 0;
  my $state = 1; # 1 : find start -- 0 : find end
  for (my $i = 0; $i < @char; $i++) {
    if ($state) { # find next non-space
      if ($char[$i] =~ /\s/) {
        next;
      } else {
        $state = not $state; # flip
        $spacecount = 0;
        push @starting, $i;
        next;
      }
    } else {
      if ($char[$i] =~ /\s/) {
        $spacecount++;
        if ($spacecount >= 2) {
          $state = not $state; # flip
          next;
        }
      } else {
        $spacecount = 0; # reset consecutive space counter
        next;
      }
    }
  }
}

regex - Perl キャプチャの位置を見つける方法

1 に答える 1

Related

Reference