regex - awk を使用して圧縮された XML フィードからテキストを解析したい

Question

http://rss.slashdot.org/Slashdot/slashdotの圧縮された XML フィードから解析しようとして<title>います。私は次のことをしようとしています<description>

curl --silent "http://rss.slashdot.org/Slashdot/slashdot" | awk '/\btitle\b(.*?)\bdescription\b/'

などgrep -Eですが、必要な部分文字列を取得できませんでした。XML は圧縮されており、データは 1 行にあるため、常に XML 全体が返されます。

テキストエディターで実行して、正規表現文字列をテストすることができました。

あなたの助けに感謝！！ありがとうございました！

score 1 · Accepted Answer

XML パーサーを使用すると役立ちます。ここでは、とを使用したテストperlですXML::Twig。ニーズに合わせて調整してください。

の内容script.pl:

#!/usr/bin/env perl

use warnings;
use strict;
use XML::Twig;

my $twig = XML::Twig->new(
    twig_handlers => {
        'title' => \&extract_text,
        'description' => \&extract_text,
    },  
)->parsefile( shift );

sub extract_text {
    my ($t, $e) = @_; 
    printf qq|%s\n=================\n|, $e->tag;
    printf qq|%s\n\n|, $e->text;
}

次のように実行します。

curl --silent "http://rss.slashdot.org/Slashdot/slashdot" | perl script.pl -

これにより、各ペアのタイトルと説明について次のような結果が得られます。

title
=================
Proof-of-Concept Port of XBMC to SDL 2.0 and Wayland

description
=================
hypnosec wrote in with news that XBMC has  ...

score 0 · Accepted Answer

XSLTソリューションは次のとおりです。

curl -s -o- http://rss.slashdot.org/Slashdot/slashdot | xsltproc slashdot.xsl -

どこにslashdot.xslありますか

<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output method="text" />

<xsl:variable name="newline">
<xsl:text>
</xsl:text>
</xsl:variable>

<xsl:template match="/">
    <xsl:apply-templates select='//item' />
</xsl:template>

<xsl:template match='//item'>
    <xsl:value-of select='title' /><xsl:value-of select='$newline' />
    <xsl:text>====</xsl:text><xsl:value-of select='$newline' />
    <xsl:value-of select='description' /><xsl:value-of select='$newline' />
    <xsl:value-of select='$newline' />
</xsl:template>

</xsl:stylesheet>

regex - awk を使用して圧縮された XML フィードからテキストを解析したい

2 に答える 2

Related

Reference