regex - Perl を使用して文字列から HTML アンカーリンク以外のすべてを削除する

Question

Perl を使用して、正規表現を使用して、次のようにアンカー付きの 1 つの HTML リンクを含むランダムな HTML を含む文字列を取得するにはどうすればよいですか。

  <a href="http://example.com" target="_blank">Whatever Example</a>

それだけを残して、他のすべてを取り除きますか？title=<a 、 like 、 orなどの href 属性内に何があったとしてstyle=も。そしてそれはアンカーを残します: "Whatever Example" と </a>?

score 2 · Accepted Answer

HTML::TokeParser::Simpleなどのストリームパーサーを利用できます。

#!/usr/bin/env perl

use strict;
use warnings;

use HTML::TokeParser::Simple;

my $html = <<EO_HTML;

Using Perl, how can I use a regex to take a string that has random HTML in it
with one HTML link with anchor, like this:

   <a href="http://example.com" target="_blank">Whatever <i>Interesting</i> Example</a>

       and it leave ONLY that and get rid of everything else? No matter what
   was inside the href attribute with the <a, like title=, or style=, or
   whatever. and it leave the anchor: "Whatever Example" and the </a>?
EO_HTML

my $parser = HTML::TokeParser::Simple->new(string => $html);

while (my $tag = $parser->get_tag('a')) {
    print $tag->as_is, $parser->get_text('/a'), "</a>\n";
}

出力：

$ ./whatever.pl
<a href="http://example.com" target="_blank">興味深い例は何でも</a>

regex - Perl を使用して文字列から HTML アンカー リンク以外のすべてを削除する

2 に答える 2

Related

Reference

regex - Perl を使用して文字列から HTML アンカーリンク以外のすべてを削除する