php - php preg_replace を使用して、img 要素の形式がどれほど悪いかに関係なく、src 値を先頭に追加する

Question

私のhtmlコンテンツは次のようになります。

<div class="preload"><img src="PRODUCTPAGE_files/like_icon_u10_normal.png" width="1" height="1"/><img src="PRODUCTPAGE_files/read_icon_u12_normal.png" width="1" height="1"/><img src="PRODUCTPAGE_files/line_u14_line.png" width="1" height="1"/>

これは、各 img 要素を区切る改行がなく、インデントがまったくない、途切れのない長い 1 行です。

私が使用するphpコードは次のとおりです。

/**
 *
 * Take in html content as string and find all the <script src="yada.js" ... >
 * and add $prepend to the src values except when there is http: or https:
 *
 * @param $html String The html content
 * @param $prepend String The prepend we expect in front of all the href in css tags
 * @return String The new $html content after find and replace. 
 * 
 */
    protected static function _prependAttrForTags($html, $prepend, $tag) {
        if ($tag == 'css') {
            $element = 'link';
            $attr = 'href';
        }
        else if ($tag == 'js') {
            $element = 'script';
            $attr = 'src';
        }
        else if ($tag == 'img') {
            $element = 'img';
            $attr = 'src';
        }
        else {
            // wrong tag so return unchanged
            return $html;
        }
        // this checks for all the "yada.*"
        $html = preg_replace('/(<'.$element.'\b.+'.$attr.'=")(?!http)([^"]*)(".*>)/', '$1'.$prepend.'$2$3$4', $html);
        // this checks for all the 'yada.*'
        $html = preg_replace('/(<'.$element.'\b.+'.$attr.'='."'".')(?!http)([^"]*)('."'".'.*>)/', '$1'.$prepend.'$2$3$4', $html);
        return $html;
    }
}

img 要素の形式がどれほど悪いかに関係なく、関数が機能するようにします。

src 属性の位置に関係なく機能する必要があります。

行うべき唯一のことは、src 値の前に何かを追加することです。

また、src 値が http で始まる場合、この preg_replace は発生しないことに注意してください。

現在、コンテンツが次の場合にのみコードが機能します。

<div class="preload">
    <img src="PRODUCTPAGE_files/like_icon_u10_normal.png" width="1" height="1"></img>
    <img src="PRODUCTPAGE_files/read_icon_u12_normal.png" width="1" height="1"/><img src="PRODUCTPAGE_files/line_u14_line.png" width="1" height="1"/><img src="PRODUCTPAGE_files/line_u15_line.png" width="1" height="1"/>

おそらくご想像のとおり、次の行に移動し、開始 img タグの末尾に / がないため、最初の img 要素に対してのみ正常に実行されます。

私の機能を改善する方法を教えてください。

アップデート：

私はDOMDocumentを使用しましたが、うまくいきました！src 値を先頭に追加した後、それを php コードスニペットに置き換える必要があります。

とてもオリジナル：

<img src="PRODUCTPAGE_files/read_icon_u12_normal.png" width="1" height="1"/>

DOMDocument を使用して先頭に文字列を追加した後:

<img src="prepended/PRODUCTPAGE_files/read_icon_u12_normal.png" width="1" height="1" />

ここで、すべてを次のものに置き換える必要があります。

<?php echo $this->Html->img('prepended/PRODUCTPAGE_files/read_icon_u12_normal.png', array('width'=>'1', height='1')); ?>

DOMDocument は引き続き使用できますか? または、preg_replace を使用する必要がありますか?

score 1 · Accepted Answer

DomDocument は、独自の HTML パーサーを構築するのではなく、HTML がどれほど混乱していても解析するように構築されています。

とを組み合わせて使用すると、次のようDomDocumentにXPath実行できます。

<?php
$html = <<<HTML
<script src="test"/><link href="test"/><div class="preload"><img src="PRODUCTPAGE_files/like_icon_u10_normal.png" width="1" height="1"/><img src="PRODUCTPAGE_files/read_icon_u12_normal.png" width="1" height="1"/><img src="PRODUCTPAGE_files/line_u14_line.png" width="1" height="1"/><img width="1" height="1" src="httpPRODUCTPAGE_files/line_u14_line.png"/>
HTML;

$doc = new DOMDocument();
@$doc->loadHTML($html);
$xpath = new DOMXpath($doc);
$searchTags = $xpath->query('//img | //link | //script');

$length = $searchTags->length;
for ($i = 0; $i < $length; $i++) {
    $element = $searchTags->item($i);

    if ($element->tagName == 'link')
        $attr = 'href';
    else
        $attr = 'src';

    $src = $element->getAttribute($attr);
    if (!startsWith($src, 'http'))
    {
        $element->setAttribute($attr, "whatever" . $src);
    }
}

// this small function will check the start of a string 
// with a given term, in your case http or http://
function startsWith($haystack, $needle)
{
    return !strncmp($haystack, $needle, strlen($needle));
}

$result = $doc->saveHTML();
echo $result;

これが動作するライブデモです。

終了タグが欠落しているなどのように HTML が台無しになっている場合は、 before を使用できます@$doc->loadHTML($html);。

$doc->recover = true;
$doc->strictErrorChecking = false;

出力をフォーマットしたい場合は、前に使用できます@$doc->loadHTML($html);：

$doc->formatOutput = true;

XPath を使用すると、編集する必要があるデータのみをキャプチャするため、他の要素について心配する必要はありません。

bodyHTMLに , html,などのタグが欠落している場合、doctype自動的headに追加されますが、既にそれらのタグがある場合は、他に何もするべきではないことに注意してください。

ただし、それらを削除したい場合は、次の代わりに以下を使用できます$doc->saveHTML();。

$result = preg_replace('~<(?:!DOCTYPE|/?(?:html|head|body))[^>]*>\s*~i', '', $doc->saveHTML());

要素をその場所で新しく作成された要素に置き換えたい場合は、これを使用できます。

$newElement = $doc->createElement($element->tagName, '');
$newElement->setAttribute($attr, "prepended/" . $src);
$myArrayWithAttributes = array ('width' => '1', 'height' => '1');
foreach ($myArrayWithAttributes as $attribute=>$value)
    $newElement->setAttribute($attribute, $value);
$element->parentNode->replaceChild($newElement, $element);

フラグメントを作成することにより:

$frag = $doc->createDocumentFragment();
$frag->appendXML('<?php echo $this->Html->img("prepended/PRODUCTPAGE_files/read_icon_u12_normal.png", array("width"=>"1", "height"=>"1")); ?>');
$element->parentNode->replaceChild($frag, $element);

ライブデモ。

tidyを使用して HTML をフォーマットできます。

$tidy = tidy_parse_string($result, array(
    'indent' => TRUE,
    'output-xhtml' => TRUE,
    'indent-spaces' => 4
));
$tidy->cleanRepair();
echo $tidy;

php - php preg_replace を使用して、img 要素の形式がどれほど悪いかに関係なく、src 値を先頭に追加する

1 に答える 1

Related

Reference