php - PHPを使用して、区切り文字を配置するタグを削除するか、配列に格納します

Question

次のようなURLからタグデータを削除しました

$url='http://abcd.com';
$d=stripslashes(file_get_contents($url));
echo strip_tags($d);

but unfortunately all the tag values are clubbed together like user14036100 9.00user23034003 11.33user32028000 14.00 where in the user1, user2, user3 attributes are stored, It is hard to analyse the attribute values as all are joined together by strip_tags().

so friends can someone help me to strip each tag and store in an array or by placing a delimiter at the end of each stripped tag data.

Thanks in advance :)

score 1 · Accepted Answer

strip_tags()タグを削除するだけなので、ではこれを達成できません。それらを空白文字（改行、スペースなど）などに置き換えたくありません。おそらく、すべてのタグを置き換えるだけの正規表現呼び出しでこれを行う必要があります。

より良い方法は、取得したページをDOMDocumentで解析して、HTML 構造から直接構造を導き出すことです。

DOMDocument の使用例

次の例の html ページがあります。

<!DOCTYPE html>
<html>
    <head>
        <title>This is my title</title>
    </head>
    <body>
        <table id="someDataHere">
            <tr>
                <th>Country</th>
                <th>Population</th>
            </tr>

            <tr>
                <td>Germany</td>
                <td>81,779,600</td>
            </tr>

            <tr>
                <td>Belgium</td>
                <td>11,007,020</td>
            </tr>

            <tr>
                <td>Netherlands</td>
                <td>16,847,007</td>
            </tr>

        </table>
    </body>
</html>

DOMDocumentテーブル内のエントリを取得するために使用できます。

$url = "...";
$dom = new DOMDocument("1.0", "UTF-8");
$dom->loadHTML(file_get_contents($url));

$preparedData = array();
$table = $dom->getElementById("someDataHere");
$tableRows = $table->getElementsByTagName('tr');

foreach ($tableRows as $tableRow)
{
    $columns = $tableRow->getElementsByTagName('td');

    // skip the header row of the table - it has no <td>, just <th>
    if (0 == $columns->length)
    {
        continue;
    }

    $preparedData[ $columns->item(0)->nodeValue ] = $columns->item(1)->nodeValue;
}

$preparedData次のデータを保持します。

Array
(
    [Germany] => 81,779,600
    [Belgium] => 11,007,020
    [Netherlands] => 16,847,007
)

いくつかのメモ

クローラー (スパイダー) を開発しているため、ターゲット Web ページの HTML 構造に大きく依存しています。テンプレートで何かを変更するたびに、クローラーを調整する必要がある場合があります。
これは単なる例ですが、より高度な結果を生成するためにどのように使用できるかを明確にする必要があります。
は DOM メソッドを実装しているためDOMDocument、それらが提供する可能性を考慮して HTML 構造を処理する必要があります。
非常に巨大な HTML ページDOMDocumentの場合、メモリの点で非常に高価になる可能性があります。

php - PHPを使用して、区切り文字を配置するタグを削除するか、配列に格納します

1 に答える 1

DOMDocument の使用例

いくつかのメモ

Related

Reference