php - PHP正規表現を使用してhtmlタグ要素から属性を削除する

Question

HTMLタグ内の属性を削除したいのですが、これは正規表現を使用して実現できると思いますが、正規表現の使用が苦手です。

str_replace を使用してみましたが、正しい方法ではありません。そして、これに似た質問を検索しましたが、見つかりませんでした。

例：

変数内で次のような html タグを取得しました。

$str = '
<p class="class_style" style="font-size: medium; line-height: normal; letter-spacing: normal;">content</p>
<span class="another_class_style" style="font-size: medium; line-height: normal; letter-spacing: normal;">content</span>
<ul class="another_class_style" style="background:#006;"></ul>
<li class="another_class_style" style=" list-style:circle; color:#930;">content</li>';

特定の preg_match() の呼び出し

$new_str = preg_match('', $str)

期待される出力:

$new_str = '
<p>content</p>
<span>content</span>
<ul></ul>
<li>content</li>';

html タグを削除するつもりはなく、タグ内のタグ要素を削除するだけでよいことに注意してください。

php strip_tags() isn't an option

これについて助けていただければ幸いです。

score 1 · Accepted Answer

正規表現でこのタスクを実行できますが、通常は、フィルタリングやその他の HTML 操作に DOM 関数を使用することをお勧めします。これは、不要なプロパティを削除するために DOM メソッドを使用する再利用可能なクラスです。必要な HTML タグとプロパティを設定するだけで、不要な HTML 部分が除外されます。

class allow_some_html_tags {
    var $doc = null;
    var $xpath = null;
    var $allowed_tags = "";
    var $allowed_properties = array();

    function loadHTML( $html ) {
        $this->doc = new DOMDocument();
        $html = strip_tags( $html, $this->allowed_tags );
        @$this->doc->loadHTML( $html );
        $this->xpath = new DOMXPath( $this->doc );
    }
    function setAllowed( $tags = array(), $properties = array() ) {
        foreach( $tags as $allow ) $this->allowed_tags .= "<{$allow}>";
        foreach( $properties as $allow ) $this->allowed_properties[$allow] = 1;
    }
    function getAttributes( $tag ) {
        $r = array();
        for( $i = 0; $i < $tag->attributes->length; $i++ )
            $r[] = $tag->attributes->item($i)->name;
        return( $r );
    }
    function getCleanHTML() {
        $tags = $this->xpath->query("//*");
        foreach( $tags as $tag ) {
            $a = $this->getAttributes( $tag );
            foreach( $a as $attribute ) {
                if( !isset( $this->allowed_properties[$attribute] ) )
                    $tag->removeAttribute( $attribute );
            }
        }
        return( strip_tags( $this->doc->saveHTML(), $this->allowed_tags ) );
    }
}

このクラスはstrip_tags2 回使用します。1 回目は不要なタグをすばやく削除し、プロパティが残りの部分から削除された後、DOM 関数 (doctype、html、body) によって挿入された追加のタグを削除します。使用するには、次のようにします。

$comments = new allow_some_html_tags();
$comments->setAllowed( array( "p", "span", "ul", "li" ), array("tabindex") );
$comments->loadHTML( $str );
$clean = $comments->getCleanHTML();

setAllowed 関数は、許可されたタグのセットと許可されたプロパティのセットの 2 つの配列を取ります (後で一部を保持することにした場合)。フィルタリング。$clean の出力は次のとおりです。

<p>content</p>
<span>content</span>
<ul tabindex="3"></ul><li>content</li>

score 0 · Accepted Answer

PHPでhtmlタグを削除する最も簡単な方法はstrip_tags()

または、次の方法で削除できます

preg_replace("/<.*?>/", "", $str);

php - PHP正規表現を使用してhtmlタグ要素から属性を削除する

3 に答える 3

Related

Reference