たとえば、「x」製品を検索して結果を表形式で表示できる Web サイトに取り組んでいます。
PHP curl を使用して、別の Web サイトから検索データをスクレイピングする予定です。(スクレイピングされている Web サイトの所有者はそれを認識して許可しているため、法的な問題はありません)。
Web サイトにアクセスしてログインし、ユーザー入力に基づいて検索を行うための php curl コードが既にあります。検索結果と出力結果を Web サイトで 1 つずつ確認する方法がわかりません。
PHPカールコード:
$username = '********';
$password = '********';
$loginUrl = 'http://www.a-website.com/login.asp';
//init curl
$ch = curl_init();
//Set the URL to work with
curl_setopt($ch, CURLOPT_URL, $loginUrl);
// ENABLE HTTP POST
curl_setopt($ch, CURLOPT_POST, 1);
//Set the post parameters
curl_setopt($ch, CURLOPT_POSTFIELDS, 'username=' . $username . '&password=' . $password . '&submit1=' . 'Login');
//Handle cookies for the login
curl_setopt($ch, CURLOPT_COOKIEJAR, 'cookie stuff hure');
//Setting CURLOPT_RETURNTRANSFER variable to 1 will force cURL
//not to print out the results of its query.
//Instead, it will return the results as a string return value
//from curl_exec() instead of the usual true/false.
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
//execute the request (the login)
$store = curl_exec($ch);
/* * *****************SEARCH HERE****************** */
curl_setopt($ch, CURLOPT_URL, 'http://www.a-website.com/Index.asp');
//execute the request
$content = curl_exec($ch);
//Set the post parameters
curl_setopt($ch, CURLOPT_POSTFIELDS, 'search_txt_vs=' . '' . '&search_txt_UPC=' . '' . '&search_txt_Name=' . $searchString .
'&search_txt_Manufacturer=' . '' . '&submit=' . 'Search');
//execute the request (the search)
$Search = curl_exec($ch);
print CJSON::encode($Search);
print $Search;
//save the data to disk
print $content;
これは、WebサイトのImスクラップのhtmlコードです(ところで、これは古い学校のテーブル形式です)
<td colspan="3" height="100%" valign="top">
<table width="100%" border="0" cellpadding="2" cellspacing="0" bordercolor="#99CCCC" class="text">
<tbody>
<tr bgcolor="#9999CC">
<td align="right" class="calendar">Sort ></td>
<td align="center"> <a href="Index.asp?search_txt_UPC=&search_txt_Name=novolin&search_txt_Manufacturer=&orderby=1">NDC</a>
</td>
<td align="left"> <a href="Index.asp?search_txt_UPC=&search_txt_Name=novolin&search_txt_Manufacturer=&orderby=2">Brand Name</a>
</td>
<td align="center" colspan="2"> <a href="Index.asp?search_txt_UPC=&search_txt_Name=novolin&search_txt_Manufacturer=&orderby=3">Strength</a>
| <a href="Index.asp?search_txt_UPC=&search_txt_Name=novolin&search_txt_Manufacturer=&orderby=4">UD</a>
</td>
<td align="left"> <a href="Index.asp?search_txt_UPC=&search_txt_Name=novolin&search_txt_Manufacturer=&orderby=5">Stock</a>
</td>
<td align="center"> <a href="Index.asp?search_txt_UPC=&search_txt_Name=novolin&search_txt_Manufacturer=&orderby=6">Manufacturer</a>
</td>
<td align="center" bgcolor="cccccc"> <a href="Index.asp?search_txt_UPC=&search_txt_Name=novolin&search_txt_Manufacturer=&orderby=7">AWP</a>
/ <a href="Index.asp?search_txt_UPC=&search_txt_Name=novolin&search_txt_Manufacturer=&orderby=8">Your Price</a>
</td>
</tr>
<tr bgcolor="#9999CC">
<td align="right" class="calendar"> </td>
<td align="center"> <a href="Index.asp?search_txt_UPC=&search_txt_Name=novolin&search_txt_Manufacturer=&orderby=9">UPC</a>
</td>
<td align="left"> <a href="Index.asp?search_txt_UPC=&search_txt_Name=novolin&search_txt_Manufacturer=&orderby=10">Generic Alt/Name</a>
</td>
<td align="center" colspan="2"> <a href="Index.asp?search_txt_UPC=&search_txt_Name=novolin&search_txt_Manufacturer=&orderby=11">Size</a>
| <a href="Index.asp?search_txt_UPC=&search_txt_Name=novolin&search_txt_Manufacturer=&orderby=12">Form</a>
</td>
<td align="left" colspan="3" class="selected">Category</td>
</tr>
<tr bgcolor="eeeeee">
<td align="center" valign="top" rowspan="2">1
<br> <a href="#" onclick="return openCart(19112,0.01021);"><span class="smallNorm_red">[add]</span></a>
</td>
<td align="center"><span class="smallNorm">00169347718</span>
</td>
<td align="left"><span class="smallNorm_red">NOVOLIN 70/ 30U/ML CRT 5X3 ML</span>
</td>
<td align="center" colspan="2"><span class="smallNorm"> 70-30 U/ML</span>
</td>
<td align="left"><span class="smallNorm">YES</span>
</td>
<td align="center"><span class="smallNorm">NOVO NORDISK PHARM</span>
</td>
<td align="center"><span class="smallNorm">$
0.01
/ $
0.01
</span>
</td>
</tr>
<tr bgcolor="eeeeee">
<td align="center"><span class="smallNorm">000000000000</span>
</td>
<td align="left"><span class="smallNorm"><a href="#" onclick="return openGeneric('50101');">HUM INSULIN NPH/REG INSULIN HM</a></span>
</td>
<td align="center" colspan="2"><span class="smallNorm"> 5X3ML </span>
</td>
<td align="left" colspan="3"><span class="smallNorm">
<a href="#" onclick="return openreturn(19112,0.01021);"><span class="smallNorm_red">[return]</span>
</a>INSULIN</span>
</td>
</tr>
<tr bgcolor="#99CCCC">
<td align="center" valign="top" rowspan="2">2
<br> <a href="#" onclick="return openCart(19116,0.012);"><span class="smallNorm_red">[add]</span></a>
</td>
<td align="center"><span class="smallNorm">00169347418</span>
</td>
<td align="left"><span class="smallNorm_red">NOVOLIN N 100 UN/ML CRT 5X3 ML</span>
</td>
<td align="center" colspan="2"><span class="smallNorm"> 100 U/ML</span>
</td>
<td align="left"><span class="smallNorm">YES</span>
</td>
<td align="center"><span class="smallNorm">NNP</span>
</td>
<td align="center"><span class="smallNorm">$
0.00
/ $
0.01
</span>
</td>
</tr>
<tr bgcolor="#99CCCC">
<td align="center"><span class="smallNorm">000000000000</span>
</td>
<td align="left"><span class="smallNorm"><a href="#" onclick="return openGeneric('05331');">NPH HUMAN INSULIN ISOPHANE</a></span>
</td>
<td align="center" colspan="2"><span class="smallNorm"> 5X3ML </span>
</td>
<td align="left" colspan="3"><span class="smallNorm">
<a href="#" onclick="return openreturn(19116,0.012);"><span class="smallNorm_red">[return]</span>
</a>INSULIN</span>
</td>
</tr>
<tr bgcolor="eeeeee">
<td align="center" valign="top" rowspan="2">3
<br> <a href="#" onclick="return openCart(45211,0.012);"><span class="smallNorm_red">[add]</span></a>
</td>
<td align="center"><span class="smallNorm">00169231721</span>
</td>
<td align="left"><span class="smallNorm_red">NOVOLIN INNO 70/30 PFS 5X3 ML</span>
</td>
<td align="center" colspan="2"><span class="smallNorm"> 70-30 U/ML</span>
</td>
<td align="left"><span class="smallNorm">YES</span>
</td>
<td align="center"><span class="smallNorm">NOVO NORDISK PHARM</span>
</td>
<td align="center"><span class="smallNorm">$
0.00
/ $
0.01
</span>
</td>
</tr>
<tr bgcolor="eeeeee">
<td align="center"><span class="smallNorm">000000000000</span>
</td>
<td align="left"><span class="smallNorm"><a href="#" onclick="return openGeneric('24486');">HUM INSULIN NPH/REG INSULIN HM</a></span>
</td>
<td align="center" colspan="2"><span class="smallNorm"> 5X3ML </span>
</td>
<td align="left" colspan="3"><span class="smallNorm">
<a href="#" onclick="return openreturn(45211,0.012);"><span class="smallNorm_red">[return]</span>
</a>INSULIN</span>
</td>
</tr>
<tr bgcolor="#99CCCC">
<td align="center" valign="top" rowspan="2">4
<br> <a href="#" onclick="return openCart(19117,82.0884);"><span class="smallNorm_red">[add]</span></a>
</td>
<td align="center"><span class="smallNorm">00169183311</span>
</td>
<td align="left"><span class="smallNorm_red">NOVOLIN R 100 UN/ML VL 10 ML</span>
</td>
<td align="center" colspan="2"><span class="smallNorm"> 100 U/ML</span>
</td>
<td align="left"><span class="smallNorm">YES</span>
</td>
<td align="center"><span class="smallNorm">NOVO NORDISK PHARM</span>
</td>
<td align="center"><span class="smallNorm">$
99.00
/ $
82.09
</span>
</td>
</tr>
<tr bgcolor="#99CCCC">
<td align="center"><span class="smallNorm">000169183311</span>
</td>
<td align="left"><span class="smallNorm"><a href="#" onclick="return openGeneric('11642');">INSULIN REGULAR HUMAN</a></span>
</td>
<td align="center" colspan="2"><span class="smallNorm"> 10ML </span>
</td>
<td align="left" colspan="3"><span class="smallNorm">
<a href="#" onclick="return openreturn(19117,82.0884);"><span class="smallNorm_red">[return]</span>
</a>INSULIN</span>
</td>
</tr>
<tr bgcolor="eeeeee">
<td align="center" valign="top" rowspan="2">5
<br> <a href="#" onclick="return openCart(19110,82.0884);"><span class="smallNorm_red">[add]</span></a>
</td>
<td align="center"><span class="smallNorm">00169183711</span>
</td>
<td align="left"><span class="smallNorm_red">NOVOLIN 70/ 30U/ML VL 10 ML</span>
</td>
<td align="center" colspan="2"><span class="smallNorm"> 70-30 U/ML</span>
</td>
<td align="left"><span class="smallNorm">YES</span>
</td>
<td align="center"><span class="smallNorm">NOVO NORDISK PHARM</span>
</td>
<td align="center"><span class="smallNorm">$
99.00
/ $
82.09
</span>
</td>
</tr>
<tr bgcolor="eeeeee">
<td align="center"><span class="smallNorm">000169183711</span>
</td>
<td align="left"><span class="smallNorm"><a href="#" onclick="return openGeneric('50001');">HUM INSULIN NPH/REG INSULIN HM</a></span>
</td>
<td align="center" colspan="2"><span class="smallNorm"> 10ML </span>
</td>
<td align="left" colspan="3"><span class="smallNorm">
<a href="#" onclick="return openreturn(19110,82.0884);"><span class="smallNorm_red">[return]</span>
</a>INSULIN</span>
</td>
</tr>
<tr bgcolor="#99CCCC">
<td align="center" valign="top" rowspan="2">6
<br> <a href="#" onclick="return openCart(19114,82.0884);"><span class="smallNorm_red">[add]</span></a>
</td>
<td align="center"><span class="smallNorm">00169183411</span>
</td>
<td align="left"><span class="smallNorm_red">NOVOLIN N 100 UN/ML VL 10 ML</span>
</td>
<td align="center" colspan="2"><span class="smallNorm"> 100 U/ML</span>
</td>
<td align="left"><span class="smallNorm">YES</span>
</td>
<td align="center"><span class="smallNorm">NOVO NORDISK PHARM</span>
</td>
<td align="center"><span class="smallNorm">$
99.00
/ $
82.09
</span>
</td>
</tr>
<tr bgcolor="#99CCCC">
<td align="center"><span class="smallNorm">000000000000</span>
</td>
<td align="left"><span class="smallNorm"><a href="#" onclick="return openGeneric('11660');">NPH HUMAN INSULIN ISOPHANE</a></span>
</td>
<td align="center" colspan="2"><span class="smallNorm"> 10ML </span>
</td>
<td align="left" colspan="3"><span class="smallNorm">
<a href="#" onclick="return openreturn(19114,82.0884);"><span class="smallNorm_red">[return]</span>
</a>INSULIN</span>
</td>
</tr>
</tbody>
</table>
</td>