php - PHPで正規表現を探す

Question

RSS フィードからいくつかの値を抽出するために、PHP で preg_match 関数を使用しています。このフィードコンテンツ内には、次のようなものがあります。

<li><strong>Something:</strong> A text with non alphanumeric characters (more text), more text with non alphanumeric characters (more text)</li>

データベースに保存するには、「英数字以外の文字を含むテキスト」と「英数字以外の文字を含むテキスト」を取得する必要があります。正規表現を使用することが最善の方法であるかどうかはわかりません。

どうもありがとう。

score 1 · Accepted Answer

正規表現を使用したい場合 (つまり、速くて汚れていて、保守性があまり高くない)、次のテキストが得られます。

$input = '<li><strong>Something:</strong> A text with non alphanumeric characters (more text), more text with non alphanumeric characters (more text)</li>';

// Match between tags
preg_match("#</strong>(.*?)</li>#", $input, $matches);
// Remove the text inside brackets
echo trim(preg_replace("#\s*\(.*?\)\s*#", '', $matches[1]));

ただし、ネストされたブラケットは失敗する場合があります。

score 0 · Accepted Answer

$str = '<li><strong>Something:</strong> A text with non alphanumeric characters (more text), more text with non alphanumeric characters (more text)</li>';
$str = preg_replace('~^.*?</strong>~', '', $str); // Remove leading markup
$str = preg_replace('~</li>$~', '', $str); // Remove trailing markup
$str = preg_replace('~\([^)]++\)~', '', $str); // Remove text within parentheses
$str = trim($str); // Clean up whitespace
$arr = preg_split('~\s*,\s*~', $str); // Split on the comma

score 0 · Accepted Answer

構造が常に同じであることを考えると、この正規表現を使用できます

</strong>([^,]*),([^<]*)</li>

グループ 1 には最初のフラグメントがあり、グループ 2 にはもう一方のフラグメントがあります

正規表現を使用して html/xml の解析を開始すると、本格的なパーサーの方が適していることがすぐに明らかになります。小規模または使い捨てのソリューションの場合、正規表現が役立ちます。

php - PHPで正規表現を探す

3 に答える 3

Related

Reference