php - PHP を使用してすべてのリンクを検出する (ファイルに移動しないものを含む)

Question

壊れたリンクを検出しようとしています。MySQL テーブルにアクセスする次の PHP は、ほとんどすべてに対してうまく機能するようです (ただし、fopen のために遅くなります)。

function fileExists($path){    
    return (@fopen($path,"r")==true); 
}
$status="";
$result = mysql_query(" SELECT id, title, link from table ");  
while ($row = mysql_fetch_array($result)) {
    $id=$row{'id'};
    $title=$row{'title'};
    $link1=$row{'link1'};
    etc.   
    if ($link){
        if (fileExists($link)!=TRUE) {
            $status='BROKEN_LINK';  
        }
    }
    //Here do something if the status gets set to broken
}

しかし、問題は次のようなリンクです。

torrentfreak.com/unblocking-the-pirate-bay-the-hard-way-is-fun-for-geeks-120506

ここでは、ファイルに移動するのではなく、どこかに移動してコンテンツを取得します。では、これらの状況が自分のドメインにない場合に、これらの状況を実際に正しく検出するための最良の方法は何でしょうか?

ありがとう！

モルダック

score 1 · Accepted Answer

cURL メソッドを使用して試すことができます。

function fileExists(&$pageScrape, $path){ // Adding parameter of cURL resource as a pointer.
    curl_setopt($pageScrape, CURLOPT_URL, $path); // Set URL path.
    curl_setopt($pageScrape, CURLOPT_RETURNTRANSFER, true); // Don't output the scraped page directly.
    curl_exec($pageScrape); // Execute cURL call.
    $status = curl_getinfo($pageScrape, CURLINFO_HTTP_CODE); // Get the HTTP status code of the page, load into variable $status.
    if ($status >= 200 && $status <= 299) { // Checking for the page success.
        return true; 
    } else {
        return false;
    }
}

$pageScrape = curl_init();

$status="";
$result = mysql_query(" SELECT id, title, link from table ");  
while ($row = mysql_fetch_array($result)) {
    $id=$row{'id'};
    $title=$row{'title'};
    $link1=$row{'link1'};
    etc.   
    if ($link){
        if (fileExists($pageScrape, $link)!=TRUE) {
            $status='BROKEN_LINK';  
        }
    }
    //Here do something if the status gets set to broken
}
curl_close($pageScrape);

HTTP ステータスコードのリストを調べることで、ステータスチェックを微調整できます:ウィキペディアのリンク

php - PHP を使用してすべてのリンクを検出する (ファイルに移動しないものを含む)

1 に答える 1

Related

Reference