php - 完全な URL を取得するには?

Question

$html = file_get_contents("any site");
$dom = new domDocument;
@$dom->loadHTML($html);
$dom->preserveWhiteSpace = false;
$images = $dom->getElementsByTagName('img');

foreach ($images as $image) {
   echo $image->src;
}

私に何も返さない

$html = file_get_contents("any site");
$dom = new domDocument;
@$dom->loadHTML($html);
$dom->preserveWhiteSpace = false;
$images = $dom->getElementsByTagName('img');

foreach ($images as $image) {
   echo $image->getAttribute('src');
}

「/images/example.jpg」のような相対 URL を返します

$html = file_get_contents("any site");
$dom = new domDocument;
@$dom->loadHTML($html);
$dom->preserveWhiteSpace = false;
$images = $dom->getElementsByTagName('img');

foreach ($images as $image) {
   echo $image.src;
}

私を返してください:

Fatal error: Call to undefined function getElementsByTagName()

では、どうすれば絶対パスを取得できますか?

score 1 · Accepted Answer

それは私のために働いています、それも試してください

<?php
  echo path_to_absolute(
    "../images/example.jpg", /* image url */
    "http://php.net/manual/en/" /* current page url */,
    false /* is your url containing file name at the end like "http://server.com/file.html" */
  );

  function path_to_absolute( $src, $base = null, $has_filename = false ) {
    if ( $has_filename && !in_array( substr( $src, 0, 1 ), array( "?", "#" ) ) ) {
      $base = dirname( $base )."/";
    }
    else {
      $base = rtrim( $base, "/" )."/";
    }

    if ( parse_url( $src, PHP_URL_HOST ) ) {
      /* Its full url, so return it without modifying */
      return $src;
    }

    if ( substr( $src, 0, 1 ) == "/" ) {
      /* $src begin with a slash, find server host and, join it with $src */
      return str_replace( parse_url( $base, PHP_URL_PATH ), "", $base ).$src;
    }

    /* remove './' from $src, we dont need it */
    $src  = ( substr( $src, 0, 2 ) === "./" ) ? substr( $src, 2, strlen( $src ) ) : $src;

    /* check how many times we need to go back **/
    $path = substr_count( $src, "../" );
    $src  = str_ireplace( "../", "", $src );

    for( $i = 1; $i <= $path; $i++ ) {
      if ( parse_url( dirname( $base ), PHP_URL_HOST ) ) {
        $base = dirname( $base ) . "/";
      }
    }

    return $base . $src;
  }
?>

使用例..
ここではphp.net、相対リンクが非常に多いため、からのリンクを検索します

<?php
  $url  = "http://www.php.net/manual/en/tokens.php";
  $html = file_get_contents( $url );
  $dom  = new DOMDocument;
  @$dom->loadHTML( $html );
  $dom->preserveWhiteSpace  = false;

  $links  = $dom->getElementsByTagName( 'a' );

  foreach( $links as $link ) {
    $original_url = $link->getAttribute( 'href' );
    $absolute_url = path_to_absolute( $original_url, $url, true );
    echo $original_url." - ".$absolute_url."\n";
  }

  /** prints...
   * / - http://www.php.net/
   * ...
   * control-structures.while.php     - http://www.php.net/manual/en/control-structures.while.php
   * control-structures.do.while.php  - http://www.php.net/manual/en/control-structures.do.while.php
   * ...
   * /sitemap.php - http://www.php.net/sitemap.php
   * /contact.php - http://www.php.net/contact.php
   * ...
   * http://developer.yahoo.com/ - http://developer.yahoo.com/
   * ...
   * ?setbeta=1&beta=1 - http://www.php.net/manual/en/tokens.php?setbeta=1&beta=1
   * ...
   * #85872 - http://www.php.net/manual/en/tokens.php#85872
   **/
?>

score 1 · Accepted Answer

parse_urlを使用してベース URL を見つけることができます。

$url = 'http://www.example.com/path?opt=234';
$parts = parse_url($url);
if (isset($parts['scheme'])){
    $base_url = $parts['scheme'].'://';
} else {
    $base_url = 'http://';
    $parts = parse_url($base_url.$url);
}
$base_url .= $parts['host'];
if (isset($parts['path'])){
    $base_url .= $parts['path'];
}

そして、次のようにコードと組み合わせます。

$html = file_get_contents("any site");
$dom = new domDocument;
@$dom->loadHTML($html);
$dom->preserveWhiteSpace = false;
$images = $dom->getElementsByTagName('img');

foreach ($images as $image) {
   echo $base_url.$image->getAttribute('src');
}

score 1 · Accepted Answer

このコードは、相対URL と完全なsrcURL を持つ属性を区別します。これは、単純な文字列連結よりも少し堅牢で、相対パスがスラッシュで始まらない場合を処理します。例：対。 images/image.jpg/images/image.jpg

<?php
$site = 'http://example.com/some/deeply/buried/page.html';
$dir = dirname($site);

$html = file_get_contents($site);
$dom = new domDocument;
@$dom->loadHTML($html);
$dom->preserveWhiteSpace = false;
$images = $dom->getElementsByTagName('img');

foreach ($images as $image) {
    // get the img src attribute
    $img_path = $image->getAttribute('src');

    // parse the path into its constituent parts
    $url_info = parse_url($img_path);

    // if the host part (or indeed any part other than "path") is set,
    // then we're dealing with a fully qualified URL (or possibly an error)
    if (!isset($url_info['host'])) {
        // otherwise, get the relative path
        $path = $url_info['path'];

        // and ensure it begins with a slash
        if (substr($path,0,1) !== '/') $path = '/'.$path;

        // concatenate the site directory with the relative path
        $img_path = $dir.$path;
    }

   echo $img_path;  // this should be a full URL
}
?>

score 0 · Accepted Answer

2 番目のソリューションをの URL と組み合わせる必要があると思います'any site'。画像の src タグには相対パスしか含まれていない可能性があるためです。Web 開発者の観点からは、絶対パスを含める必要はありません。

php - 完全な URL を取得するには?

4 に答える 4

Related

Reference