http://www.olx.in/cars-cat-378を解析して、正規表現を使用して車、場所、価格を取得する割り当てがあります。正規表現はWebの解析には適切ではないことを示唆する投稿をたくさん見ましたが、少なくとも今回はそれを使用する必要があります。私は以下に示す方法を試しました。しかし、これは機能していません。
<?php
/**
* Initialize the cURL session
*/
$ch = curl_init();
/**
* Set the URL of the page or file to download.
*/
curl_setopt($ch, CURLOPT_URL, 'http://www.olx.in/cars-cat-378');
/**
* Ask cURL to return the contents in a variable instead of simply echoing them to the browser.
*/
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
/**
* Execute the cURL session
*/
$contents = curl_exec($ch);
/*
print the $contents variable
*/
$reg='/<div class="li .*?"><div class="row clearfix"><div class="c-1 table-cell"><div class="cropit">.*?<\/div><\/div><div class="second-column-container table-cell"><h3><a .*?>(.*?)<\/a><\/h3><div class="c-4"><span>(.*?)<\/span> - <span>(.*?)<\/span> - <span>(.*?)<\/span> - <span>(.*?)<\/span><\/div><span class="itemlistinginfo clearfix"><a .*?>(.*?)<\/a><\/span><div .*?><\/div><\/div><div class="third-column-container table-cell">(.*?)<\/div><div class="fourth-column-container table-cell">(.*?)<\/div><\/div><\/div>/';
preg_match($reg,$contents,$result);
var_dump($result);
/**
* Close cURL session
*/
curl_close ($ch);
?>
ページの各リストアイテムのhtmlは次のとおりです----
<div class="li even">
<div class="row clearfix">
<div class="c-1 table-cell">
<div class="cropit">
<a class="pics-lnk" href="http://newdelhi.olx.in/honda-prelude-2-door-sports-car-for-sale-iid-437128570">
<img src="http://images04.olx-st.com/ui/14/85/70/t_1347220402_437128570_4.jpg" width="111"
alt="HONDA PRELUDE,,2 DOOR ,,SPORTS CAR FOR SALE." title="HONDA PRELUDE,,2 DOOR ,,SPORTS CAR FOR SALE. - India"
height="83" style="margin-top:0px;" />
</a>
</div>
</div>
<div class="second-column-container table-cell">
<h3>
<a href="http://newdelhi.olx.in/honda-prelude-2-door-sports-car-for-sale-iid-437128570" title="HONDA PRELUDE,,2 DOOR ,,SPORTS CAR FOR SALE. - India">
HONDA PRELUDE,,2 DOOR ,,SPORTS CAR FOR SALE.</a>
</h3>
<div class="c-4">
<span>Year: 1996</span> - <span>Make: Honda</span> - <span>Model: Prelude</span> - <span>66,400.00 km</span> </div>
<span class="itemlistinginfo clearfix">
<a href="http://newdelhi.olx.in/cars-cat-378">Cars - Delhi</a> </span>
<div style="display:none;" class="fbfriends_loadme" id="fbfriends_loadme_437128570" rel="5656149"></div>
</div>
<div class="third-column-container table-cell">
र 2,65,000.00 </div>
<div class="fourth-column-container table-cell">
Yesterday, 15:53 </div>
</div>
</div>
私が使用した正規表現は-----です。
/<div class="li .*?"><div class="row clearfix"><div class="c-1 table-cell"><div class="cropit">.*?<\/div><\/div><div class="second-column-container table-cell"><h3><a .*?>(.*?)<\/a><\/h3><div class="c-4"><span>(.*?)<\/span> - <span>(.*?)<\/span> - <span>(.*?)<\/span> - <span>(.*?)<\/span><\/div><span class="itemlistinginfo clearfix"><a .*?>(.*?)<\/a><\/span><div .*?><\/div><\/div><div class="third-column-container table-cell">(.*?)<\/div><div class="fourth-column-container table-cell">(.*?)<\/div><\/div><\/div>/'