php - PHP シンプルな HTML DOM - 識別子を持たない値へのアクセス

Question

これをhtmlに持っているとしましょう

<strong class="top">Contact Person: </strong>
<br>
Shan
<strong class="top">Email-id: </strong>
<br>
<span>abshanai@gmail.com</span>
<br>
<strong class="top">Website:</strong>
www.absgym.co.in

単純な html DOM を使用して値を取得することは可能ですか?

score 0 · Accepted Answer

<?php

$sourcelink = 'http://en.wikipedia.org/wiki/Document_Object_Model';
$retriever = curl_init(); curl_setopt($retriever, CURLOPT_URL, $sourcelink);
curl_setopt($retriever, CURLOPT_REFERER, "http://www.google.com");
curl_setopt($retriever, CURLOPT_USERAGENT, "MozillaXYZ/1.0");
curl_setopt($retriever, CURLOPT_HEADER, 0); curl_setopt($retriever,
CURLOPT_RETURNTRANSFER, true); curl_setopt($retriever, CURLOPT_TIMEOUT, 10);
$source_content =curl_exec ($retriever); curl_close ($retriever);

/*
 *preg_match('/The starting of content's html tag(.*?)ending of content's html tag, source, variable with results)
 */

preg_match('/<h1 id="firstHeading" class="firstHeading" lang="en">(.*?)<div id="bodyContent" class="mw-body-content">/s',$source_content,$selected_area);

$needed_content=$selected_area[0];

$dom_class = new DOMDocument();

@$dom_class->loadHTML($needed_content);

$processor = new DOMXPath($dom_class);

/*
 * This must be the html tag which is enclosing the targetted content to extract, syntax as below,
 * $processor->query('//html tag[@html_attribute="value"]');
 */
$process_selector = $processor->query('//span[@dir="auto"]');

foreach( $process_selector as $valuesalue ) {
    echo $values=trim($valuesalue->nodeValue); echo '<br>';
    $accumaltor[]=$values;
}

?>

従う：

変更する必要がある必須の行は、決定された DOM ドキュメントに対応する上記のコードの 3、15、および 16 行です。
行 #3: URL を使用する
行 #15: HTML 親タグは、HTML DOM の決定されたセクションを囲みます。
16 行目: これは、抽出対象のコンテンツを囲む html タグである必要があります。あなたの場合<strong class="top>Targeted content to extract</strong>、そうです

$process_selector = $processor->query('//strong[@class="top"]');

php - PHP シンプルな HTML DOM - 識別子を持たない値へのアクセス

1 に答える 1

Related

Reference