html - 安全なログインでサイトコンテンツの特定の領域をスクレイピングする

Question

ログインが保護されているWebサイトの特定のテキストをスクレイピングしようとしていますここはcurl http://www.digeratimarketing.co.uk/2008/12/16/curl-page-scraping-script/を使用したチュートリアルです

しかし、これをカールコードに実装することはできません。ここに私のカールスクリプトがあります

$url = "http://aftabcurrency.com/login_script.php";

$ch = curl_init();    
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1); 

curl_setopt($ch, CURLOPT_URL, $url); 
$cookie = 'cookies.txt';
$timeout = 30;

curl_setopt ($ch, CURLOPT_FOLLOWLOCATION, 1);
curl_setopt($ch, CURLOPT_TIMEOUT,         10); 
curl_setopt($ch, CURLOPT_CONNECTTIMEOUT,  $timeout );
curl_setopt($ch, CURLOPT_COOKIEJAR,       $cookie);
curl_setopt($ch, CURLOPT_COOKIEFILE,      $cookie);

curl_setopt ($ch, CURLOPT_POST, 1); 
curl_setopt ($ch,CURLOPT_POSTFIELDS,"user_name=user&user_password=pass&passcode=code");     

$result = curl_exec($ch); 
curl_close($ch); 
$source = $result;
if(preg_match("/(CC3300\">)(.*?)(<\/font>)/is",$source,$found)){
echo $found[2];
}else{
echo "Text not found.";
}

たとえば、aftabcurrency.com では、「私たちのサービスは重要です!」だけを破棄したいと考えています。（このテキストは毎日変わります）

score 1 · Accepted Answer

私がすることは、開始と開始の間のテキストを「切り取る」ことです...ソースでは、テキストはテキストの色613A75で始まり、< /font>タグで終了します..ここに正規表現の解決策があります:

$source = file_get_contents("http://aftabcurrency.com/index.php");
if(preg_match("/(613A75\">)(.*?)(<\/font>)/is",$source,$found)){
echo $found[2];
}else{
echo "Text not found.";
}

メンバーエリア内のテキストでこれを行いたい場合は、ここに私のソースをソースに追加し、 $source = file_get_contents... を $source = $result に置き換えます

これを行う他の方法もあります。DomDocument と xpath、または単純な strpos / strstr / substr php 関数です。

html - 安全なログインでサイト コンテンツの特定の領域をスクレイピングする

1 に答える 1

Related

Reference

html - 安全なログインでサイトコンテンツの特定の領域をスクレイピングする