php - PHP を使用してリモート Web ページから非表示のコンテンツを抽出するにはどうすればよいですか?

Question

で非表示になっている php (file_get_contents?) で Web サイトを読みたい。

4 つの例:

Uwsebvrfahr
Zeinhvöhrdorf
Babeneinhvバーグ
Ksddbnvbhgaweaoihvwsaoirasudasuirchdorf/Kr.

結果は次のようになります。

ウルファール
ツェールドルフ
バーベンバーグ
Kirchdorf/Kr.

問題を解決するための2つの可能なアプローチ（ただし、それらを実装する方法はわかりません）：
A）コンテンツを含むすべてのスパンタグを削除します
B）VISIBLEコンテンツのみをプログラムで読み取ります

ご協力ありがとうございました!!!

score 1 · Accepted Answer

http://sourceforge.net/projects/simplehtmldom/files/latest/download?source=files

include('simple_html_dom.php');

$html = file_get_html('http://www.fussballoesterreich.at/netzwerk/datenservice/379402779304830775_O~733830065019629299~744933674800963515~0~1.htm');

$i = 1;
foreach($html->find('.mannschaft a') as $e)
{
    $x = html_entity_decode($e->innertext, ENT_QUOTES, 'UTF-8');
    $x = preg_replace('#<(.*)>#', '', $x);
    echo $i, '. ', $x, '<br />';
    $i++;
}

Result:

1. Garsten
2. S. Valent.ASK
3. Bumgartenberg
4. Neuhofen/Krems
5. Admira
6. Asten
7. Enns
8. Pasching 1b
9. S. Florian 1b
10. SValentin SC
11. Hörsching
12. S Ulrich
13. Wdischgarsten
14. Doppl-Hart

My work here is done.

score 0 · Accepted Answer

スタイルが適用されているという事実は、違いはありません。PHP にとって、それは単なるテキストの集まりです。

試す：

<?php
$url = 'http://....';  // URL you're scraping.
$html = file_get_contents($url);
$text = strip_tags($html);
echo "<PRE>$text</PRE>";

php - PHP を使用してリモート Web ページから非表示のコンテンツを抽出するにはどうすればよいですか?

2 に答える 2

Related

Reference