xml - 多数または 1 つのタグを持つ XML ファイルの解析

Question

xml タグのファイルを解析しているときに問題に直面しています。問題は、ファイルに多数の xml タグが含まれているか、1 つしか含まれていない可能性があることです。正規表現と LibXML を使用して試してみました。正規表現の問題は、同じ行に 2 つの終了タグがある場合、私の式は最初のタグの開始から 2 番目の終了タグの終了までのデータを出力することです。

xml ファイル -

She outsprinted Becky Smith and Joan Hare to the line, with Becky and Joan
finishing in a time of <time>1:02:41</time> and <time>  1:02:45</time>
respectively.

私が使用している正規表現 (時間の詳細を取得したい) -

   if (/<time>(.*)<\/time>/) {
    ($hh, $mm, $ss) = split(':', $1);
    say "Time Entered - ", $hh, ":", $mm, ":", $ss, " ";
    print "***$1***\n";
    }

出力

Time Entered - 1:02:41</time> and <time>  1

期待される -

1:02:41
1:02:45

** 2 番目のアプローチ - LibXML を使用 ** 以下のコードで試してみましたが、次のようなエラーが表示されます。

"KnoxHalfResults:1: parser error : Start tag expected, '<' not found
Jim Colatis won Tuesday's Knoxville half marathon in a blistering pace"

入力ファイルにはこのデータがあります-

Jim Colatis won Tuesday's Knoxville half marathon in a blistering pace 
of <time>   0:56:45   </time>. He was followed to the line by long time nemesis 
Mickey Mouse in a time of <time>0:58:49</time>.

my code for LibXML -
use warnings;
#use XML::Twig;
use XML::LibXML;

my $filein;
my $fileout;

($filein, $fileout) = @ARGV;

my $parser = XML::LibXML->new();
my $xmldoc = $parser->parse_file($filein);

for my $sample ($xmldoc->findnodes('/time')) {

print $sample->nodeName(), ": ", $sample->textContent(), "\n";

}

score 1 · Accepted Answer

前述のとおり、データは XML ではないため、XML パーサーは使用できません。

整形式の XML にする方法はありますか? それをダミーのルートタグでラップしてから、XML::LibXML (または XML::Twig ;--) コードを使用するだけで十分な場合があります。

#!/usr/bin/perl

use strict;
use warnings;
use XML::Twig;
use File::Slurp;

my ($filein, $fileout) = @ARGV;

my @times;

my $t= XML::Twig->new( twig_handlers => { time => sub { push @times, $_->text; } })
                ->parse( '<dummy>' . read_file( $filein) . '</dummy>');

print "$_\n" foreach @times;

ただし、ファイル内のテキストが適切な XML テキストであることを確認する必要があります。マークアップに含めたり、マークアップの一部ではないものを含め<たりしないでください。&

xml - 多数または 1 つのタグを持つ XML ファイルの解析

2 に答える 2

Related

Reference