php - HTMLタグから属性を削除するにはどうすればよいですか？

Question

phpを使用して、段落タグなどのタグからすべて/すべての属性を削除するにはどうすればよいですか？

<p class="one" otherrandomattribute="two">に<p>

score 14 · Accepted Answer

より良い方法はありますが、実際には、正規表現を使用して html タグから引数を取り除くことができます。

<?php
function stripArgumentFromTags( $htmlString ) {
    $regEx = '/([^<]*<\s*[a-z](?:[0-9]|[a-z]{0,9}))(?:(?:\s*[a-z\-]{2,14}\s*=\s*(?:"[^"]*"|\'[^\']*\'))*)(\s*\/?>[^<]*)/i'; // match any start tag

    $chunks = preg_split($regEx, $htmlString, -1,  PREG_SPLIT_DELIM_CAPTURE);
    $chunkCount = count($chunks);

    $strippedString = '';
    for ($n = 1; $n < $chunkCount; $n++) {
        $strippedString .= $chunks[$n];
    }

    return $strippedString;
}
?>

上記はおそらくより少ない文字数で書かれる可能性がありますが、それは仕事をします (迅速かつ汚い)。

score 9 · Accepted Answer

SimpleXML を使用して属性を削除 (PHP5 の標準)

<?php

// define allowable tags
$allowable_tags = '<p><a><img><ul><ol><li><table><thead><tbody><tr><th><td>';
// define allowable attributes
$allowable_atts = array('href','src','alt');

// strip collector
$strip_arr = array();

// load XHTML with SimpleXML
$data_sxml = simplexml_load_string('<root>'. $data_str .'</root>', 'SimpleXMLElement', LIBXML_NOERROR | LIBXML_NOXMLDECL);

if ($data_sxml ) {
    // loop all elements with an attribute
    foreach ($data_sxml->xpath('descendant::*[@*]') as $tag) {
        // loop attributes
        foreach ($tag->attributes() as $name=>$value) {
            // check for allowable attributes
            if (!in_array($name, $allowable_atts)) {
                // set attribute value to empty string
                $tag->attributes()->$name = '';
                // collect attribute patterns to be stripped
                $strip_arr[$name] = '/ '. $name .'=""/';
            }
        }
    }
}

// strip unallowed attributes and root tag
$data_str = strip_tags(preg_replace($strip_arr,array(''),$data_sxml->asXML()), $allowable_tags);

?>

score 8 · Accepted Answer

必要な属性を除くすべての属性を削除できる関数の1つを次に示します。

function stripAttributes($s, $allowedattr = array()) {
  if (preg_match_all("/<[^>]*\\s([^>]*)\\/*>/msiU", $s, $res, PREG_SET_ORDER)) {
   foreach ($res as $r) {
     $tag = $r[0];
     $attrs = array();
     preg_match_all("/\\s.*=(['\"]).*\\1/msiU", " " . $r[1], $split, PREG_SET_ORDER);
     foreach ($split as $spl) {
      $attrs[] = $spl[0];
     }
     $newattrs = array();
     foreach ($attrs as $a) {
      $tmp = explode("=", $a);
      if (trim($a) != "" && (!isset($tmp[1]) || (trim($tmp[0]) != "" && !in_array(strtolower(trim($tmp[0])), $allowedattr)))) {

      } else {
          $newattrs[] = $a;
      }
     }
     $attrs = implode(" ", $newattrs);
     $rpl = str_replace($r[1], $attrs, $tag);
     $s = str_replace($tag, $rpl, $s);
   }
  }
  return $s;
}

たとえば、次のようになります。

echo stripAttributes('<p class="one" otherrandomattribute="two">');

またはあなたが例えば。「クラス」属性を保持したい：

echo stripAttributes('<p class="one" otherrandomattribute="two">', array('class'));

または

受信トレイにメッセージを送信し、CKEDITORを使用してメッセージを作成したとすると、次のように関数を割り当て、送信する前に$message変数にエコーすることができます。stripAttributes（）という名前の関数は、不要なすべてのhtmlタグを削除することに注意してください。私はそれを試しました、そしてそれはうまくいきます。太字などで追加したフォーマットしか見ませんでした

$message = stripAttributes($_POST['message']);

またはecho $message;プレビューすることができます。

score 5 · Accepted Answer

5

HTML Purifierは、PHP で HTML をサニタイズするための優れたツールの 1 つです。

于 2009-04-20T21:45:05.867 に答える

score 5 · Accepted Answer

正直なところ、これを行う唯一の適切な方法は、タグと属性のホワイトリストをHTML Purifierライブラリで使用することだと思います。スクリプト例:

<html><body>

<?php

require_once '../includes/htmlpurifier-4.5.0-lite/library/HTMLPurifier/Bootstrap.php';
spl_autoload_register(array('HTMLPurifier_Bootstrap', 'autoload'));

$config = HTMLPurifier_Config::createDefault();
$config->set('HTML.Allowed', 'p,b,a[href],i,br,img[src]');
$config->set('URI.Base', 'http://www.example.com');
$config->set('URI.MakeAbsolute', true);

$purifier = new HTMLPurifier($config);

$dirty_html = "
  <a href=\"http://www.google.de\">broken a href link</a
  fnord

  <x>y</z>
  <b>c</p>
  <script>alert(\"foo!\");</script>

  <a href=\"javascript:alert(history.length)\">Anzahl besuchter Seiten</a>
  <img src=\"www.example.com/bla.gif\" />
  <a href=\"http://www.google.de\">missing end tag
 ende 
";

$clean_html = $purifier->purify($dirty_html);

print "<h1>dirty</h1>";
print "<pre>" . htmlentities($dirty_html) . "</pre>";

print "<h1>clean</h1>";
print "<pre>" . htmlentities($clean_html) . "</pre>";

?>

</body></html>

これにより、次のクリーンで標準に準拠した HTML フラグメントが生成されます。

<a href="http://www.google.de">broken a href link</a>fnord

y
<b>c
<a>Anzahl besuchter Seiten</a>
<img src="http://www.example.com/www.example.com/bla.gif" alt="bla.gif" /><a href="http://www.google.de">missing end tag
ende 
</a></b>

あなたの場合、ホワイトリストは次のようになります。

$config->set('HTML.Allowed', 'p');

score -1 · Accepted Answer

html purifier を調べることもできます。確かに、それは非常に肥大化しており、この特定の例のみを考慮している場合、ニーズに合わないかもしれませんが、敵対的な可能性のあるhtmlの多かれ少なかれ「防弾」浄化を提供します. また、特定の属性を許可または禁止することも選択できます (高度に構成可能です)。

http://htmlpurifier.org/

php - HTMLタグから属性を削除するにはどうすればよいですか？

6 に答える 6

Related

Reference