php - PHPでドメイン名（サブドメインではない）を取得する

Question

次のいずれかの形式のURLがあります。

http://example.com
https://example.com
http://example.com/foo
http://example.com/foo/bar
www.example.com
example.com
foo.example.com
www.foo.example.com
foo.bar.example.com
http://foo.bar.example.com/foo/bar
example.net/foo/bar

基本的に、通常のURLと一致する必要があります。example.com 単一の正規表現を介してこれらすべてから（または.net、tldが何であれ。これが任意のTLDで機能するために必要です。）どのように抽出できますか？

score 49 · Accepted Answer

さて、あなたparse_urlはホストを取得するために使用することができます：

$info = parse_url($url);
$host = $info['host'];

次に、TLDとホストのみを取得するためにいくつかの凝ったことを行うことができます

$host_names = explode(".", $host);
$bottom_host_name = $host_names[count($host_names)-2] . "." . $host_names[count($host_names)-1];

あまりエレガントではありませんが、機能するはずです。

説明が必要な場合は、次のようになります。

まず、の機能をhttp://使用して、スキーム（など）の間のすべてを取得し、 parse_urlURLを解析します。:)

次に、ホスト名を取得し、ピリオドが含まれる場所に基づいて配列に分割します。次のようtest.world.hello.mynameになります。

array("test", "world", "hello", "myname");

その後、配列内の要素の数を取得します（4）。

次に、そこから2を引いて、最後から2番目の文字列（ホスト名、またはexample、この例では）を取得します。

次に、そこから1を引いて、TLDとも呼ばれる最後の文字列を取得します（配列キーは0から始まるため）。

次に、これら2つの部分をピリオドと組み合わせて、ベースホスト名を取得します。

score 17 · Accepted Answer

https://gist.github.com/pocesar/5366899の私のソリューション

そしてテストはここにありますhttp://codepad.viper-7.com/GAh1tP

これは、任意のTLD、および恐ろしいサブドメインパターン（最大3つのサブドメイン）で機能します。

多くのドメイン名に含まれているテストがあります。

StackOverflowのコードのインデントがおかしいため、ここに関数を貼り付けません（githubのようなフェンスで囲まれたコードブロックが含まれている可能性があります）

score 15 · Accepted Answer

完全に同じ構造と長さの多くのケースが存在するため、比較するTLDリストを使用せずにドメイン名を取得することはできません。

www.db.de（サブドメイン）とbbc.co.uk（ドメイン）
big.uk.com（SLD）とwww.uk.com（TLD）

Mozillaのパブリックサフィックスリストは、すべての主要なブラウザで使用されているため、最良のオプションである必要があります：
https ://publicsuffix.org/list/public_suffix_list.dat

私の関数を自由に使用してください：

function tld_list($cache_dir=null) {
    // we use "/tmp" if $cache_dir is not set
    $cache_dir = isset($cache_dir) ? $cache_dir : sys_get_temp_dir();
    $lock_dir = $cache_dir . '/public_suffix_list_lock/';
    $list_dir = $cache_dir . '/public_suffix_list/';
    // refresh list all 30 days
    if (file_exists($list_dir) && @filemtime($list_dir) + 2592000 > time()) {
        return $list_dir;
    }
    // use exclusive lock to avoid race conditions
    if (!file_exists($lock_dir) && @mkdir($lock_dir)) {
        // read from source
        $list = @fopen('https://publicsuffix.org/list/public_suffix_list.dat', 'r');
        if ($list) {
            // the list is older than 30 days so delete everything first
            if (file_exists($list_dir)) {
                foreach (glob($list_dir . '*') as $filename) {
                    unlink($filename);
                }
                rmdir($list_dir);
            }
            // now set list directory with new timestamp
            mkdir($list_dir);
            // read line-by-line to avoid high memory usage
            while ($line = fgets($list)) {
                // skip comments and empty lines
                if ($line[0] == '/' || !$line) {
                    continue;
                }
                // remove wildcard
                if ($line[0] . $line[1] == '*.') {
                    $line = substr($line, 2);
                }
                // remove exclamation mark
                if ($line[0] == '!') {
                    $line = substr($line, 1);
                }
                // reverse TLD and remove linebreak
                $line = implode('.', array_reverse(explode('.', (trim($line)))));
                // we split the TLD list to reduce memory usage
                touch($list_dir . $line);
            }
            fclose($list);
        }
        @rmdir($lock_dir);
    }
    // repair locks (should never happen)
    if (file_exists($lock_dir) && mt_rand(0, 100) == 0 && @filemtime($lock_dir) + 86400 < time()) {
        @rmdir($lock_dir);
    }
    return $list_dir;
}
function get_domain($url=null) {
    // obtain location of public suffix list
    $tld_dir = tld_list();
    // no url = our own host
    $url = isset($url) ? $url : $_SERVER['SERVER_NAME'];
    // add missing scheme      ftp://            http:// ftps://   https://
    $url = !isset($url[5]) || ($url[3] != ':' && $url[4] != ':' && $url[5] != ':') ? 'http://' . $url : $url;
    // remove "/path/file.html", "/:80", etc.
    $url = parse_url($url, PHP_URL_HOST);
    // replace absolute domain name by relative (http://www.dns-sd.org/TrailingDotsInDomainNames.html)
    $url = trim($url, '.');
    // check if TLD exists
    $url = explode('.', $url);
    $parts = array_reverse($url);
    foreach ($parts as $key => $part) {
        $tld = implode('.', $parts);
        if (file_exists($tld_dir . $tld)) {
            return !$key ? '' : implode('.', array_slice($url, $key - 1));
        }
        // remove last part
        array_pop($parts);
    }
    return '';
}

それが特別なもの：

スキームの有無にかかわらず、URL、ホスト名、ドメインなどのすべての入力を受け入れます
高いメモリ使用量を回避するために、リストは行ごとにダウンロードされます
TLDごとにキャッシュフォルダに新しいファイルを作成するため、存在するかどうかget_domain()を確認するだけでよいため、 TLDExtractfile_exists()のようにすべてのリクエストに巨大なデータベースを含める必要はありません。
リストは30日ごとに自動的に更新されます

テスト：

$urls = array(
    'http://www.example.com',// example.com
    'http://subdomain.example.com',// example.com
    'http://www.example.uk.com',// example.uk.com
    'http://www.example.co.uk',// example.co.uk
    'http://www.example.com.ac',// example.com.ac
    'http://example.com.ac',// example.com.ac
    'http://www.example.accident-prevention.aero',// example.accident-prevention.aero
    'http://www.example.sub.ar',// sub.ar
    'http://www.congresodelalengua3.ar',// congresodelalengua3.ar
    'http://congresodelalengua3.ar',// congresodelalengua3.ar
    'http://www.example.pvt.k12.ma.us',// example.pvt.k12.ma.us
    'http://www.example.lib.wy.us',// example.lib.wy.us
    'com',// empty
    '.com',// empty
    'http://big.uk.com',// big.uk.com
    'uk.com',// empty
    'www.uk.com',// www.uk.com
    '.uk.com',// empty
    'stackoverflow.com',// stackoverflow.com
    '.foobarfoo',// empty
    '',// empty
    false,// empty
    ' ',// empty
    1,// empty
    'a',// empty    
);

説明付きの最近のバージョン（ドイツ語）：
http ：//www.programmierer-forum.de/domainnamen-ermitteln-t244185.htm

score 6 · Accepted Answer

$onlyHostName = implode('.', array_slice(explode('.', parse_url($link, PHP_URL_HOST)), -2));

https://subdomain.domain.com/some/path例として使用

parse_url($link, PHP_URL_HOST)戻り値subdomain.domain.com

explode('.', parse_url($link, PHP_URL_HOST))次にsubdomain.domain.com、配列に分割します。

array(3) {
  [0]=>
  string(5) "subdomain"
  [1]=>
  string(7) "domain"
  [2]=>
  string(3) "com"
}

array_slice次に、配列をスライスして、最後の2つの値のみが配列に含まれるようにします（で示されます-2）。

array(2) {
  [0]=>
  string(6) "domain"
  [1]=>
  string(3) "com"
}

implode次に、これら2つの配列値を組み合わせて、最終的に次の結果を取得します。domain.com

注：これは、または.のように、期待しているエンドドメインに1つしかない場合にのみ機能しますsomething.domain.comelse.something.domain.net

それはsomething.domain.co.ukあなたが期待する場所では機能しませんdomain.co.uk

score 6 · Accepted Answer

この問題を処理する最良の方法は次のとおりです。

$second_level_domains_regex = '/\.asn\.au$|\.com\.au$|\.net\.au$|\.id\.au$|\.org\.au$|\.edu\.au$|\.gov\.au$|\.csiro\.au$|\.act\.au$|\.nsw\.au$|\.nt\.au$|\.qld\.au$|\.sa\.au$|\.tas\.au$|\.vic\.au$|\.wa\.au$|\.co\.at$|\.or\.at$|\.priv\.at$|\.ac\.at$|\.avocat\.fr$|\.aeroport\.fr$|\.veterinaire\.fr$|\.co\.hu$|\.film\.hu$|\.lakas\.hu$|\.ingatlan\.hu$|\.sport\.hu$|\.hotel\.hu$|\.ac\.nz$|\.co\.nz$|\.geek\.nz$|\.gen\.nz$|\.kiwi\.nz$|\.maori\.nz$|\.net\.nz$|\.org\.nz$|\.school\.nz$|\.cri\.nz$|\.govt\.nz$|\.health\.nz$|\.iwi\.nz$|\.mil\.nz$|\.parliament\.nz$|\.ac\.za$|\.gov\.za$|\.law\.za$|\.mil\.za$|\.nom\.za$|\.school\.za$|\.net\.za$|\.co\.uk$|\.org\.uk$|\.me\.uk$|\.ltd\.uk$|\.plc\.uk$|\.net\.uk$|\.sch\.uk$|\.ac\.uk$|\.gov\.uk$|\.mod\.uk$|\.mil\.uk$|\.nhs\.uk$|\.police\.uk$/';
$domain = $_SERVER['HTTP_HOST'];
$domain = explode('.', $domain);
$domain = array_reverse($domain);
if (preg_match($second_level_domains_regex, $_SERVER['HTTP_HOST']) {
    $domain = "$domain[2].$domain[1].$domain[0]";
} else {
    $domain = "$domain[1].$domain[0]";
}

score 6 · Accepted Answer

ドメイン名を使用するすべての操作には、TLDExtractライブラリを使用することをお勧めします。

score 6 · Accepted Answer

echo getDomainOnly("http://example.com/foo/bar");

function getDomainOnly($host){
    $host = strtolower(trim($host));
    $host = ltrim(str_replace("http://","",str_replace("https://","",$host)),"www.");
    $count = substr_count($host, '.');
    if($count === 2){
        if(strlen(explode('.', $host)[1]) > 3) $host = explode('.', $host, 2)[1];
    } else if($count > 2){
        $host = getDomainOnly(explode('.', $host, 2)[1]);
    }
    $host = explode('/',$host);
    return $host[0];
}

score 4 · Accepted Answer

ホストからサブドメインを抽出する方法は2つあります。

より正確な最初の方法は、tldsのデータベース（public_suffix_list.datなど）を使用し、ドメインをそれに一致させることです。これは場合によっては少し重いです。php-domain-parserやTLDExtractのようにそれを使用するためのいくつかのPHPクラスがあります。

2番目の方法は、最初の方法ほど正確ではありませんが、非常に高速で、多くの場合、正しい答えを出すことができます。この関数を作成しました。

function get_domaininfo($url) {
    // regex can be replaced with parse_url
    preg_match("/^(https|http|ftp):\/\/(.*?)\//", "$url/" , $matches);
    $parts = explode(".", $matches[2]);
    $tld = array_pop($parts);
    $host = array_pop($parts);
    if ( strlen($tld) == 2 && strlen($host) <= 3 ) {
        $tld = "$host.$tld";
        $host = array_pop($parts);
    }

    return array(
        'protocol' => $matches[1],
        'subdomain' => implode(".", $parts),
        'domain' => "$host.$tld",
        'host'=>$host,'tld'=>$tld
    );
}

例：

print_r(get_domaininfo('http://mysubdomain.domain.co.uk/index.php'));

戻り値：

Array
(
    [protocol] => https
    [subdomain] => mysubdomain
    [domain] => domain.co.uk
    [host] => domain
    [tld] => co.uk
)

score 3 · Accepted Answer

これは、ドメインがccTLDを使用しているか、新しいスタイルの長いTLDを使用しているかに関係なく、サブドメインなしでドメインを取得するために作成した関数です。ルックアップや既知のTLDの膨大な配列はなく、正規表現もありません。。三項演算子とネストを使用するとはるかに短くなる可能性がありますが、読みやすくするために拡張しました。

// Per Wikipedia: "All ASCII ccTLD identifiers are two letters long, 
// and all two-letter top-level domains are ccTLDs."

function topDomainFromURL($url) {
  $url_parts = parse_url($url);
  $domain_parts = explode('.', $url_parts['host']);
  if (strlen(end($domain_parts)) == 2 ) { 
    // ccTLD here, get last three parts
    $top_domain_parts = array_slice($domain_parts, -3);
  } else {
    $top_domain_parts = array_slice($domain_parts, -2);
  }
  $top_domain = implode('.', $top_domain_parts);
  return $top_domain;
}

score 2 · Accepted Answer

pocesarが提供するソリューションに問題がありました。たとえばsubdomain.domain.nlを使用すると、domain.nlは返されません。代わりに、subdomain.domain.nlが返されます。別の問題は、domain.com.brがcom.brを返すことでした。

よくわかりませんが、次のコードでこれらの問題を修正しました（誰かに役立つことを願っています。そうであれば、私は幸せな人です）。

function get_domain($domain, $debug = false){
    $original = $domain = strtolower($domain);
    if (filter_var($domain, FILTER_VALIDATE_IP)) {
        return $domain;
    }
    $debug ? print('<strong style="color:green">&raquo;</strong> Parsing: '.$original) : false;
    $arr = array_slice(array_filter(explode('.', $domain, 4), function($value){
        return $value !== 'www';
    }), 0); //rebuild array indexes
    if (count($arr) > 2){
        $count = count($arr);
        $_sub = explode('.', $count === 4 ? $arr[3] : $arr[2]);
        $debug ? print(" (parts count: {$count})") : false;
        if (count($_sub) === 2){ // two level TLD
            $removed = array_shift($arr);
            if ($count === 4){ // got a subdomain acting as a domain
                $removed = array_shift($arr);
            }
            $debug ? print("<br>\n" . '[*] Two level TLD: <strong>' . join('.', $_sub) . '</strong> ') : false;
        }elseif (count($_sub) === 1){ // one level TLD
            $removed = array_shift($arr); //remove the subdomain
            if (strlen($arr[0]) === 2 && $count === 3){ // TLD domain must be 2 letters
                array_unshift($arr, $removed);
            }elseif(strlen($arr[0]) === 3 && $count === 3){
                array_unshift($arr, $removed);
            }else{
                // non country TLD according to IANA
                $tlds = array(
                    'aero',
                    'arpa',
                    'asia',
                    'biz',
                    'cat',
                    'com',
                    'coop',
                    'edu',
                    'gov',
                    'info',
                    'jobs',
                    'mil',
                    'mobi',
                    'museum',
                    'name',
                    'net',
                    'org',
                    'post',
                    'pro',
                    'tel',
                    'travel',
                    'xxx',
                );
                if (count($arr) > 2 && in_array($_sub[0], $tlds) !== false){ //special TLD don't have a country
                    array_shift($arr);
                }
            }
            $debug ? print("<br>\n" .'[*] One level TLD: <strong>'.join('.', $_sub).'</strong> ') : false;
        }else{ // more than 3 levels, something is wrong
            for ($i = count($_sub); $i > 1; $i--){
                $removed = array_shift($arr);
            }
            $debug ? print("<br>\n" . '[*] Three level TLD: <strong>' . join('.', $_sub) . '</strong> ') : false;
        }
    }elseif (count($arr) === 2){
        $arr0 = array_shift($arr);
        if (strpos(join('.', $arr), '.') === false && in_array($arr[0], array('localhost','test','invalid')) === false){ // not a reserved domain
            $debug ? print("<br>\n" .'Seems invalid domain: <strong>'.join('.', $arr).'</strong> re-adding: <strong>'.$arr0.'</strong> ') : false;
            // seems invalid domain, restore it
            array_unshift($arr, $arr0);
        }
    }
    $debug ? print("<br>\n".'<strong style="color:gray">&laquo;</strong> Done parsing: <span style="color:red">' . $original . '</span> as <span style="color:blue">'. join('.', $arr) ."</span><br>\n") : false;
    return join('.', $arr);
}

score 2 · Accepted Answer

これは、「co.uk」のようなセカンドレベルドメインを持つドメインを含むすべてのドメインで機能するものです。

function strip_subdomains($url){

    # credits to gavingmiller for maintaining this list
    $second_level_domains = file_get_contents("https://raw.githubusercontent.com/gavingmiller/second-level-domains/master/SLDs.csv");

    # presume sld first ...
    $possible_sld = implode('.', array_slice(explode('.', $url), -2));

    # and then verify it
    if (strpos($second_level_domains, $possible_sld)){
        return  implode('.', array_slice(explode('.', $url), -3));
    } else {
        return  implode('.', array_slice(explode('.', $url), -2));
    }
}

ここに重複する質問があるようです：delete-subdomain-from-url-string-if-subdomain-is-found

score 2 · Accepted Answer

function getDomain($url){
    $pieces = parse_url($url);
    $domain = isset($pieces['host']) ? $pieces['host'] : '';
    if(preg_match('/(?P<domain>[a-z0-9][a-z0-9\-]{1,63}\.[a-z\.]{2,6})$/i', $domain, $regs)){
        return $regs['domain'];
    }
    return FALSE;
}

echo getDomain("http://example.com"); // outputs 'example.com'
echo getDomain("http://www.example.com"); // outputs 'example.com'
echo getDomain("http://mail.example.co.uk"); // outputs 'example.co.uk'

score 1 · Accepted Answer

非常に遅く、あなたが正規表現をキーワードとしてマークし、私の関数が魅力のように機能していることがわかりました。これまでのところ、失敗するURLは見つかりませんでした。

function get_domain_regex($url){
  $pieces = parse_url($url);
  $domain = isset($pieces['host']) ? $pieces['host'] : '';
  if (preg_match('/(?P<domain>[a-z0-9][a-z0-9\-]{1,63}\.[a-z\.]{2,6})$/i', $domain, $regs)) {
    return $regs['domain'];
  }else{
    return false;
  }
}

正規表現のないものが必要な場合は、これを持っています。これもこの投稿から取得したと思います

function get_domain($url){
  $parseUrl = parse_url($url);
  $host = $parseUrl['host'];
  $host_array = explode(".", $host);
  $domain = $host_array[count($host_array)-2] . "." . $host_array[count($host_array)-1];
  return $domain;
}

どちらも素晴らしい動作をしますが、URLがhttp：//またはhttps：//で始まらない場合は失敗するので、URL文字列がプロトコルで始まることを確認するのに少し時間がかかりました。

score 0 · Accepted Answer

単にこれを試してください：

   preg_match('/(www.)?([^.]+\.[^.]+)$/', $yourHost, $matches);

   echo "domain name is: {$matches[0]}\n";

これは大多数のドメインで機能します。

score 0 · Accepted Answer

この関数は、http：//またはhttps：//なしでURLを解析した場合でも、指定されたURLの拡張子なしでドメイン名を返します。

このコードを拡張できます

(?:\.co)?(?:\.com)?(?:\.gov)?(?:\.net)?(?:\.org)?(?:\.id)?

より多くの第2レベルのドメイン名を処理する場合は、より多くの拡張機能を使用します。

    function get_domain_name($url){
      $pieces = parse_url($url);
      $domain = isset($pieces['host']) ? $pieces['host'] : $url;
      $domain = strtolower($domain);
      $domain = preg_replace('/.international$/', '.com', $domain);
      if (preg_match('/(?P<domain>[a-z0-9][a-z0-9\-]{1,90}\.[a-z\.]{2,6})$/i', $domain, $regs)) {
          if (preg_match('/(.*?)((?:\.co)?(?:\.com)?(?:\.gov)?(?:\.net)?(?:\.org)?(?:\.id)?(?:\.asn)?.[a-z]{2,6})$/i', $regs['domain'], $matches)) {
              return $matches[1];
          }else  return $regs['domain'];
      }else{
        return $url;
      }
    }

score 0 · Accepted Answer

私はこれを使って同じ目標を達成していますが、それは常に機能します。他の人の役に立つことを願っています。

$url          = https://use.fontawesome.com/releases/v5.11.2/css/all.css?ver=2.7.5
$handle       = pathinfo( parse_url( $url )['host'] )['filename'];
$final_handle = substr( $handle , strpos( $handle , '.' ) + 1 );

print_r($final_handle); // fontawesome

score 0 · Accepted Answer

最も簡単なソリューション

@preg_replace('#\/(.)*#', '', @preg_replace('#^https?://(www.)?#', '', $url))

score -1 · Accepted Answer

単にこれを試してください：

<?php
  $host = $_SERVER['HTTP_HOST'];
  preg_match("/[^\.\/]+\.[^\.\/]+$/", $host, $matches);
  echo "domain name is: {$matches[0]}\n";
?>

php - PHPでドメイン名（サブドメインではない）を取得する

18 に答える 18

Related

Reference