php - PHPでcurlの概念を使用して内部テキストを取得します

Question

これはウェブサイトの html テキストです。取得したい

あなたが死ぬ前に見るべき1,000の場所

<ul class="listings">
<li>
<a href="http://watchseries.eu/serie/1,000_places_to_see_before_you_die" title="1,000 Places To See Before You Die">
1,000 Places To See Before You Die
<span class="epnum">2009</span>
</a>
</li>

私はこのようなコードを使用しました

foreach($html->find('ul.listings li a') as $e)
echo $e->innertext. '<br/>';

私が得ている出力は次のようなものです

 999: Whats Your Emergency<span class="epnum">2012</span>

スパンを含めて、これを助けてください

score 4 · Accepted Answer

なぜDOMDocumentタイトル属性を取得しないのですか?:

$string = '<ul class="listings">
<li>
<a href="http://watchseries.eu/serie/1,000_places_to_see_before_you_die" title="1,000 Places To See Before You Die">
1,000 Places To See Before You Die
<span class="epnum">2009</span>
</a>
</li>';

$dom = new DOMDocument;
$dom->loadHTML($string);
$xpath = new DOMXPath($dom);
$text = $xpath->query('//ul[@class="listings"]/li/a/@title')->item(0)->nodeValue;
echo $text;

また

$text = explode("\n", trim($xpath->query('//ul[@class="listings"]/li/a')->item(0)->nodeValue));
echo $text[0];

コードパッドの例

score 1 · Accepted Answer

これを解決するために私が考えることができる2つの方法があります。1 つは、アンカータグから title 属性を取得することです。もちろん、すべての人がアンカータグのタイトル属性を設定しているわけではありません。そのように入力したい場合、属性の値は異なる可能性があります。もう 1 つの解決策は、innertext属性を取得してから、アンカータグのすべての子を空の値に置き換えることです。

だから、これをするか

$e->title;

またはこれ

$text = $e->innertext;
foreach ($e->children() as $child)
{
    $text = str_replace($child, '', $text);
}

DOMDocumentただし、代わりにこれを使用することをお勧めします。

score 0 · Accepted Answer

strip_tags()そのために使用できます

echo trim(strip_tags($e->innertext));

またはpreg_replace()、不要なタグとそのコンテンツを削除するために使用してみてください

echo preg_replace('/<span[^>]*>([\s\S]*?)<\/span[^>]*>/', '', $e->innertext);

score -1 · Accepted Answer

まず、html を確認します。今は

  $string = '<ul class="listings">
               <li>
                  <a href="http://watchseries.eu/serie/1,000_places_to_see_before_you_die" title="1,000 Places To See Before You Die">
 1,000 Places To See Before You Die
                    <span class="epnum">2009</span>
                 </a>
             </li>';

ul の終了タグはありません。おそらく見逃しています。

  $string = '<ul class="listings">
               <li>
                  <a href="http://watchseries.eu/serie/1,000_places_to_see_before_you_die" title="1,000 Places To See Before You Die">
 1,000 Places To See Before You Die
                    <span class="epnum">2009</span>
                 </a>
             </li>
            </ul>';

このようにしてみてください

 $xml = simplexml_load_string($string);
 echo $xml->li->a['title'];

score -1 · Accepted Answer

plaintext代わりに使用してください。

echo $e->plaintext;

ただし、正規表現を使用して切り取ることができる年はまだ存在します。

ここのドキュメントの例:

$html = str_get_html("<div>foo <b>bar</b></div>");
$e = $html->find("div", 0);

echo $e->tag; // Returns: " div"
echo $e->outertext; // Returns: " <div>foo <b>bar</b></div>"
echo $e->innertext; // Returns: " foo <b>bar</b>"
echo $e->plaintext; // Returns: " foo bar"

php - PHPでcurlの概念を使用して内部テキストを取得します

5 に答える 5

Related

Reference