php - preg_match_allには、すべての結果と特定の値のない結果が含まれます

Question

次の文字列に対してpreg_match_allを実行しようとしています。

    $string1 = '/<a href="(.*?).(jpg|jpeg|png|gif|bmp|ico)"><img(.*?)class="(.*?)wp-image-(.*?)" title="(.*?)" (.*?) \/><\/a>/i';
preg_match_all( $string, $content, $matches, PREG_SET_ORDER);

上記は私がしていることには問題なく機能しますが、問題は「title」タグのない画像も検出する必要があることです。

preg_match_allを実行し、文字列にvalue [6]がない場合に一致を追加する方法はありますか？（タイトルフラグはvalue [6]です）、そしてそれらの結果（タイトルなし）に特別な名前（つまり$ matches_no_title？

私の現在の解決策は、2つの異なる文字列（1つにtitle = ""の部分がないことを除いて同じ文字列）で2つのpreg_match_allを実行することですが、Webサイトの速度を最適化するために1つのpreg_match_allですべてを実行できれば、それはより良いでしょう！

score 2 · Accepted Answer

正規表現は、何をしたいのかについての最善のアプローチではありません。HTMLを解析して、必要なものを取得してみてください。

$dom = new domDocument;
$dom->loadHTML($html);
$dom->preserveWhiteSpace = false;
$images = $dom->getElementsByTagName('img');
foreach ($images as $image) {
    echo $image->getAttribute('src');
}

score 1 · Accepted Answer

I would think alternation with a null will do what you want:

$string1 = '/<a href="(.*?).(jpg|jpeg|png|gif|bmp|ico)"><img(.*?)class="(.*?)wp-image-(.*?)" (|title="(.*?)") (.*?) \/><\/a>/i';
preg_match_all( $string1, $content, $matches, PREG_SET_ORDER);

You may also need to get fancy about optional whitespace; as it is, you'll be expecting to match a space before and after the optional title="blah" tokens, which means that the match would look for two spaces if the title="blah" isn't there... so you may want

wp-image-(.*?)"(| title="(.*?)" )(.*?) \/>

or

wp-image-(.*?)"(|\s+title="(.*?)"\s+)(.*?) \/>

instead of

wp-image-(.*?)" (|title="(.*?)") (.*?) \/>

php - preg_match_allには、すべての結果と特定の値のない結果が含まれます

3 に答える 3

Related

Reference