php - PHP クローラーを使用して属性をフェッチする

Question

ウェブサイトのクロールから名前、住所、場所を取得しようとしています。その単一のページであり、これ以外のものは必要ありません。以下のコードを使用しています。

<?php

include 'simple_html_dom.php';

$html = "http://www.phunwa.com/phone/0191/2604233";
$dom = new DomDocument();
$dom->loadHtml($html);
$xpath = new DomXpath($dom);
$div = $xpath->query('//*[@class="address-tags"]')->item(0);
for($i=0; $i < $div->length; $i++ )
    {

        print "nodename=".$div->item( $i )->nodeName;
        print "\t";
         print "nodevalue : ".$div->item( $i )->nodeValue;
         print "\r\n";
            echo $link->getElementsByTagName("<p>");
    }
?>

ウェブサイトのhtmlソースコードは

 <div class="address-tags">
            <p><strong>Name:</strong> RAJ GOPAL SINGH</p>
            <p><strong>Address:</strong> R/O BARNAI NETARKOTHIAN, P.O.MUTHI TEH.&amp; DISTT.JAMMU,X, 181206</p>
            <p><strong>Location:</strong> JAMMU, Jammu &amp; Kashmir, India</p>
            <p><strong>Other Numbers:</strong> <a href="/phone/191/2604233">01912604233</a> | <a href="/phone/191/2604233">+911912604233</a> | <a href="/phone/191/2604233">+91-191-2604233</a></p>

3つの属性を出力として取得するのを手伝ってください。現在、ページには何もエコーされていません。

どうもありがとう。

score 0 · Accepted Answer

$dom->load($html);の代わりに必要です$dom->loadHtml($html);。これを行うと、次のようになります。html の形式が適切ではないため、$xpath空のままにしてください。

多分次のようなものを試してください：

$html = file_get_contents('http://www.phunwa.com/phone/0191/2604233');

$name = preg_replace('/(.*)(<p><strong>Name:<\/strong> )([^<]+)(<\/p>)(.*)/mis','$3',$html);
$address = preg_replace('/(.*)(<p><strong>Address:<\/strong> )([^<]+)(<\/p>)(.*)/mis','$3',$html);
$location = preg_replace('/(.*)(<p><strong>Location:<\/strong> )([^<]+)(<\/p>)(.*)/mis','$3',$html);
$othernumbers = preg_replace('/(.*)(<p><strong>Other Numbers:<\/strong> )(.*)/mis','$3',$html);
list($othernumbers,$trash)= preg_split('/<\/p>/mis',$othernumbers,0);
echo 'name: '.$name.'<br>address: '.$address.'<br>location: '.$location.'<br>other numbers: '.$othernumbers;
exit;

php - PHP クローラーを使用して属性をフェッチする

2 に答える 2

Related

Reference