perl - Perl XML::DOM::パーサー

Question

perl、XML::DOM、および XML::Parser を使用して、RSS フィードから情報を取得しようとしています。XML::DOM と XML::Parser に関するドキュメントを入手するのに苦労しています :(

これは RSS フィードの支出です。

<rss version="2.0">
<channel>
    <item>
        <title>The title numer 1</title>
        <link>
        http://www.example.com/link1.php?getfile=1&sha=1234567890
        </link>
        <description>
        File 1
        </description>
    </item>
    <item>
        <title>The title numer 2</title>
        <link>
        http://www.example.com/link1.php?getfile=2&sha=0192837465
        </link>
        <description>
        File 2
        </description>
    </item>
        <item>
        <title>The title numer 3</title>
        <link>
        http://www.example.com/link1.php?getfile=1&sha=0987654321
        </link>
        <description>
        File 3
        </description>
    </item>
</channel>

そこで、この RSS フィードから「タイトル」と「リンク」を取得しようとしています。

XML::LibXML または XML::simple または XML::RSS を使用できません

score 0 · Accepted Answer

XML データに問題があります (引用符で囲まれていない '&' 文字):

のような行

...getfile=1&sha...

次のように記述する必要があります

...getfile=1&amp;sha...

これが修正されたら、XML::Reader:PP を使用して XML を解析できます。

use strict;
use warnings;

use XML::Reader::PP;

my $rdr = XML::Reader::PP->new(\*DATA, { mode => 'branches' },
  { root => '/rss/channel/item', branch => [ '/title', '/link' ] });

while ($rdr->iterate) {
    my ($title, $link) = $rdr->value;

    for ($title, $link) {
        $_ = '' unless defined $_;
    }

    print "title = '$title'\n";
    print "link  = '$link'\n";
}

__DATA__
<rss version="2.0">
  <channel>
    <item>
        <title>The title numer 1</title>
        <link>
        http://www.example.com/link1.php?getfile=1&amp;sha=1234567890
        </link>
        <description>
        File 1
        </description>
    </item>
    <item>
        <title>The title numer 2</title>
        <link>
        http://www.example.com/link1.php?getfile=2&amp;sha=0192837465
        </link>
        <description>
        File 2
        </description>
    </item>
        <item>
        <title>The title numer 3</title>
        <link>
        http://www.example.com/link1.php?getfile=1&amp;sha=0987654321
        </link>
        <description>
        File 3
        </description>
    </item>
  </channel>
</rss>

score 0 · Accepted Answer

RSS XML ファイルの解析に問題があります。ファイル用

<xml>
<channel>
    <item>
        <title>The title numer 1</title>
        </item>

    <item>
        <title>The title numer 2</title>
        </item>
</channel>
</xml>

できるよ

use strict;
use warnings;
use XML::Parser;
use Data::Dumper;
use XML::DOM::Lite qw(Parser XPath);

my $parser = Parser->new();
my $doc = $parser->parseFile('2.xml', whitespace => 'strip');


#XML::DOM::Lite::NodeList - blessed array ref for containing Node objects
my $nlist = $doc->selectNodes('/xml/channel/item/title');


foreach my $node (@{$nlist})
{
    print $node->firstChild()->nodeValue() . "\n";
}

perl - Perl XML::DOM::パーサー

3 に答える 3

Related

Reference