php - PHP-HTML解析::単純なhtmldomパーサーを使用してWebページの文字セット値を取得するにはどうすればよいですか？

Question

PHP ::単純なhtmldomパーサー（utf-8、windows-255など）を使用してWebページの文字セット値を取得するにはどうすればよいですか？

備考：htmldomパーサーhttp://simplehtmldom.sourceforge.netで実行する必要があります

例1のWebページの文字セット入力：

<meta content="text/html; charset=utf-8" http-equiv="Content-Type">

結果：utf-8

Example2 Webページの文字セット入力：

<meta content="text/html; charset=windows-255" http-equiv="Content-Type">

結果：windows-255

編集：

私はこれを試します（しかし、それは機能しません）：

$html = file_get_html('http://www.google.com/');
$el=$html->find('meta[content]',0);
echo $el->charset;

何を変えるべきですか？（$ el-> charsetが機能しないことを知っています）

ありがとう

score 3 · Accepted Answer

正規表現を使用して文字列を一致させる必要があります (PCRE があることを願っています...)。

$el=$html->find('meta[http-equiv=Content-Type]',0)
$fullvalue = $el->content;
preg_match('/charset=(.+)/', $fullvalue, $matches);
echo $matches[1];

あまり堅牢ではありませんが、動作するはずです。

score 2 · Accepted Answer

$dd = new DOMDocument;
$dd->loadHTML($data);
foreach ($dd->getElementsByTagName("meta") as $m) {
    if (strtolower($m->getAttribute("http-equiv")) == "content-type") {
        $v = $m->getAttribute("content");
        if (preg_match("#.+?/.+?;\\s?charset\\s?=\\s?(.+)#i", $v, $m))
            echo $m[1];
    }
}

DOM 拡張機能は、すべてのデータを暗黙的に UTF-8 に変換することに注意してください。

score 1 · Accepted Answer

MvanGeestの回答に感謝します-少し修正するだけで、完璧に機能します。

$html = file_get_html('http://www.google.com/');
$el=$html->find('meta[content]',0);
$fullvalue = $el->content;
preg_match('/charset=(.+)/', $fullvalue, $matches);
echo substr($matches[0], strlen("charset="));

php - PHP-HTML解析::単純なhtmldomパーサーを使用してWebページの文字セット値を取得するにはどうすればよいですか？

編集：

3 に答える 3

Related

Reference