多くの人が回答で言っていることとは反対に、先読みは必要ありません(正規表現以外)。次のように、区切り文字の一部をキャプチャするだけで済みます。
my @hash_fields = grep { length; } split /\s*(\w+):\s*/;
以下の私の完全な解決策:
my %handlers
= ( players => sub { return [ grep { length; } split /\s*,\s*/, shift ]; }
, personnel => sub {
my $value = shift;
my %personnel;
# Using recursive regex for nested parens
while ( $value =~ m/([^(]*)([(](?:[^()]+|(?2))*[)])/g ) {
my ( $name, $role ) = ( $1, $2 );
$role =~ s/^\s*[(]\s*//;
$role =~ s/\s*[)]\s*$//;
$name =~ s/^\s+//;
$name =~ s/\s+$//;
$personnel{ $role } = $name;
}
return \%personnel;
}
);
my %hash = grep { length; } split /(?:^|\s+)(\w+):\s+/, <DATA>;
foreach my $field ( keys %handlers ) {
$hash{ $field } = $handlers{ $field }->( $hash{ $field } );
}
ダンプは次のようになります。
%hash: {
personnel => {
'assistant coach (es)' => 'Aitor Karanka',
'head coach' => 'José Mourinho'
},
players => [
'Zinédine Zidane',
'Ronaldo',
'Luís Figo',
'Roberto Carlos',
'Raúl'
],
stadium => 'Santiago Bernabeu',
team => 'Real Madrid',
title => 'Football'
}