php - PHP で URL を使用して要素の特定のコンテンツブロックを取得する

Question

重複の可能性:
PHP で HTML を解析および処理する方法は?

私は file_get_contents(url) メソッドを知っていますが、最初に file_get_contents(url) を使用してページのコンテンツを取得し、コンテンツからコンテンツの特定のブロックを抽出または取得できるメソッド/関数があることを望んでいましたfile_get_contents(url) を使用して取得しますか? サンプルは次のとおりです。

したがって、コードは次のようになります。

$pageContent = file_get_contents('http://www.pullcontentshere.com/');

これはの出力になります$pageContent

<html> <body>
    <div id="myContent">
        <ul>    
            <li></li>
            <li></li>
            <li></li>
        </ul>
    </div> 
</body> </html>

たぶん、何かを提案したり<div id="myContent">、その子全体を具体的に抽出する方法を考えたりしていますか?

したがって、次のようになります。

$content = function_here($pageContent);

したがって、出力は次のようになります。

        <div id="myContent">
            <ul>    
                <li></li>
                <li></li>
                <li></li>
            </ul>
        </div>

回答をお待ちしております。

score 3 · Accepted Answer

別の方法は、正規表現を使用することです。

<?php

$string = '<html> <body> 
    <div id="myContent"> 
        <ul>     
            <li></li> 
            <li></li> 
            <li></li> 
        </ul> 
    </div>  
</body> </html>';

if ( preg_match ( '/<div id="myContent"(.*?)<\/div>/s', $string, $matches ) )
{
    foreach ( $matches as $key => $match )
    {
        echo $key . ' => ' . htmlentities ( $match ) . '<br /><br />';
    }
}
else
{
    echo 'No match';
}

?>

実際の例: http://codepad.viper-7.com/WSoWCh

score 3 · Accepted Answer

nullpointr の回答で説明されているように、組み込みの SimpleXMLElement を使用するか、正規表現を使用することもできます。私が通常非常に単純だと思う別の解決策は、PHP Simple HTML DOM Parserです。このライブラリで jQuery スタイルのセレクターを使用できます。コードの簡単な例は次のようになります。

// Create DOM from url
$html = file_get_html('http://www.pullcontentshere.com');
// Use a selector to reach the content you want
$myContent = $html->find('div.myContent')->plaintext;

score 0 · Accepted Answer

問題を解決するには、XML 解析を使用する必要があります。すでにphpの一部になっているSimpleXMLをお勧めします。次に例を示します。

$sitecontent = "
<html>   
   <body>
      <div>
         <ul>    
            <li></li>
            <li></li>
            <li></li>
         </ul>
      </div> 
   </body> 
 </html>";

 $xml = new SimpleXMLElement($sitecontent);
 $xpath = $xml->xpath('//div');

 print_r($xpath);

php - PHP で URL を使用して要素の特定のコンテンツ ブロックを取得する

3 に答える 3

Related

Reference

php - PHP で URL を使用して要素の特定のコンテンツブロックを取得する