perl - Using Perl Mechanize to strip text from a web page

Question

I am trying to scrape only the test information from a web page which is set up with a set of divs, tags etc. I want to only extract information from a specific div class, and again only the information within the tags.

<div class="col col60 moduledetail"><table cellspacing="2" cellpadding="0" border="0" id="moduleDetail"><tr><th class="moduleCode">test<small>CRN: 33413</small></th><th>test</th></tr><tr><td class="label"><nobr>Campus</nobr></td><td><a target="_blank" href="test/">test</a></td></tr><tr><td class="label">

above is a snippet of what is contained within the web page. My attempt at getting the page contents is doing exactly what it says, its getting everything from the web page, how can i narrow this down to this class and only the textual information within the tags.

thanks

score 3 · Accepted Answer

HTML パーサーを使用します。を使用した例を次に示しHTML::TreeBuilderます。

 use WWW::Mechanize;
 use HTML::TreeBuilder;

 my $mech = WWW::Mechanize->new;
 $mech->get($url);

 my $tree = HTML::TreeBuilder->new_from_content($mech->content);

 if (my $div = $tree->look_down(_tag => "div", class => "col col60 moduledetail")) {
     print $div->as_text(), "\n";
 }
 $tree->delete();

perl - Using Perl Mechanize to strip text from a web page

1 に答える 1

Related

Reference