0

robots.txt ファイルの内容をダウンロードしようとしています

私の元の問題リンク: URL/robots.txt の PHP file_exists() は false を返します

これは22行目です:$f = fopen($file, 'r');

このエラーが発生します:

PHP Error[2]: fopen(http://www1.macys.com/robots.txt): failed to open stream: Redirection limit reached, aborting
    in file /host/chapache/host/apache/www/home/flaviuspogacian/proiecte/Mickey_ClosetAffair_Discovery/webroot/protected/modules/crawler/components/Robots.php at line 22
#0 /host/chapache/host/apache/www/home/flaviuspogacian/proiecte/Mickey_ClosetAffair_Discovery/webroot/protected/modules/crawler/components/Robots.php(22): fopen()

$website_id は数字で、$website はhttp://www.domain.com/のようなコードです。

public function read_website_save_2_db($website_id, $website) {
    $slashes = 0;
    for ($i = 0; $i < strlen($website); $i++)
        if ($website[$i] == '/')
            $slashes++;
    if ($slashes == 2)
        $file = $website . '/robots.txt';
    else
        $file = $website . 'robots.txt';
    echo $website_id . ' ' . $file . PHP_EOL;
    try {
        $f = fopen($file, 'r');
        if (($f) || (strpos(get_headers($file, 1), "404") !== FALSE)) {
            fclose($f);
            echo 'exists' . PHP_EOL;
            $curl_tool = new CurlTool();
            $content = $curl_tool->downloadFile($file, ROBOTS_TXT_FILES . 'robots_' . $website_id . '.txt');
            //if the file exists on local disk, delete it
            if (file_exists(ROBOTS_TXT_FILES . 'robots_' . $website_id . '.txt'))
                unlink(ROBOTS_TXT_FILES . 'robots_' . $website_id . '.txt');
            echo ROBOTS_TXT_FILES . 'robots_' . $website_id . '.txt', $content . PHP_EOL;
            file_put_contents(ROBOTS_TXT_FILES . 'robots_' . $website_id . '.txt', $content);
        }
    else {
        echo 'maybe it\'s not there' . PHP_EOL;
    }
} catch (Exception $e) {
    echo 'EXCEPTION ' . $e . PHP_EOL;
}

}

4

1 に答える 1

2

コードの一部が乱雑に見えます。代わりにこのようなことをします(ただし、もちろん、関数内からのエコーはありません。例にすぎません)

public function read_website_save_2_db($website_id, $website) {
  $url = rtrim($website, '/') . '/robots.txt';
  $content = @file_get_contents($url);
  $status = 0;
  $success = false;
  if( !empty($http_response_header) ) {
    foreach($http_response_header as $header) {
      if(substr($header, 0, 6) == 'HTTP/1') {
        $status = trim(substr($header, strpos($header, ' '), strlen($header)));
        $success = strnatcasecmp($status, '200 OK') === 0;
        break;
      }
    }
  }
  if(!$success) {
    echo 'Request failed with status '.$status;
  }
  elseif(!$content) {
    echo 'Website responded with empty robots.txt';
  }
  else {
    file_put_contents(ROBOTS_TXT_FILES . 'robots_' . $website_id . '.txt', $content);
    echo 'Wii, we have downloaded a copy of '.$url;
  }
}
于 2012-08-15T11:45:24.583 に答える