xpath - XPathを使用してHTMLページからタイトルタグを取得しますか？

Question

Xpathクエリを使用してタイトルタグを抽出しようとしているページが2つあります。このページは機能します： http ：//www.hobbyfarms.com/farm-directory/category-home-and-barn-resources-1.aspx

このページはそうではありません： http ：//cattletoday.com/links/Barns_and_Metal_Buildings/page-1.html？s = A

これが私のコードです：

$dom = new DOMDocument();
@$dom->loadHTMLFile($href);
$xpath = new DOMXPath($dom);

$titleNode = $xpath->query("//title");
foreach ($titleNode as $n) {
    $pageTitle = $n->nodeValue;
}

私もこれを試しました：

$xpath->query('//title')->item(0)->textContent

ただし、1つのURLでも機能しません。

なぜこれが起こっているのか誰かがわかりますか？そしてうまくいけば、解決策があります。

score 4 · Accepted Answer

ファイルは Gzip され、次のスクリプトが機能します。

$href = 'http://cattletoday.com/links/Barns_and_Metal_Buildings/page-1.html?s=A';
$dom = new DOMDocument();
$file = gzdecode(file_get_contents($href));
$dom->loadHTML($file);
$xpath = new DOMXPath($dom); 
$titleNode = $xpath->query('//title');
var_dump($titleNode->item(0));

(使用されている gzdecode 関数に注意してください)

score 2 · Accepted Answer

2 番目のページは XHTML 名前空間を使用するため、その名前空間で修飾された XPath を使用する必要があります。

$xpath->registerNamespace("xhtml", "http://www.w3.org/1999/xhtml");
$titleNode = $xpath->query("//xhtml:title|//title");

xpath - XPathを使用してHTMLページからタイトルタグを取得しますか？

2 に答える 2

Related

Reference