php - PHP を使用して HTML から値を抽出する

Question

cURL を使用して HTML ページを取得しています。htmlページにはこのような表があります。

<table class="table2" style="width:85%; text-align:center">
<tr>
<th>Refference ID</th>
<th>Transaction No</th>
<th>Type</th>
<th>Operator</th>
<th>Amount</th>
<th>Slot</th>
</tr>
<tr>
<td>130717919020ffqClE0nRaspoB</td>
<td>8801458920369</td>
<td>Purchase</td>
<td>Visa</td>
<td>50</td>
<td>20130717091902413</td>
</tr>
</table>

これは、その HTML ページ内の唯一のテーブルです。PHP を使用して参照 ID とスロットを抽出する必要があります。

しかし、それがどのように行われるのかわかりません。

編集： これは私を大いに助けました。

score 1 · Accepted Answer

受け入れられた回答のような正規表現ベースのソリューションは、HTML ドキュメントから情報を抽出する正しい方法ではありません。

DOMDocument代わりに、次のようなベースソリューションを使用します。

$str = '<table class="table2" style="width:85%; text-align:center">
<tr>
<th>Refference ID</th>
  ...
<th>Slot</th>
</tr>
<tr>
<td>130717919020ffqClE0nRaspoB</td>
  ...
<td>20130717091902413</td>
</tr>
</table>';

// Create a document out of the string. Initialize XPath
$doc = new DOMDocument();
$doc->loadHTML($str);
$selector = new DOMXPath($doc);

// Query the values in a stable and easy to maintain way using XPath
$refResult = $selector->query('//table[@class="table2"]/tr[2]/td[1]');
$slotResult = $selector->query('//table[@class="table2"]/tr[2]/td[6]');

// Check if the data was found
if($refResult->length !== 1 || $slotResult->length !== 1) {
   die("Data is corrupted");
}

// XPath->query always returns a node set, even if 
// this contains only a single value.
$refId = $refResult->item(0)->nodeValue;
$slot = $slotResult->item(0)->nodeValue;

echo "RefId: $refId, Slot: $slot", PHP_EOL;

php - PHP を使用して HTML から値を抽出する

2 に答える 2

Related

Reference