php - HTMLエンティティを適切にサブストラクションする方法は？

Question

私はこれが好きです：

$mytext="that&#039;s really &quot;confusing&quot; and &lt;absolutly&gt; silly";
echo substr($mytext,0,6);

この場合の出力は次のようになります：that&#の代わりにthat's

私が欲しいのは、htmlエンティティを1文字として数え、次にsubstrとして数えることです。なぜなら、私は常にテキストの最後に壊れたhtmlまたはいくつかのあいまいな文字で終わるからです。

それをhtmlデコードしてからsubstrしてからエンコードするように私に提案しないでください、私はきれいな方法が欲しいです:)

ありがとう

score 5 · Accepted Answer

There are two ways of doing this:

You can decode the HTML entities, substr() and then encode; or
You can use a regular expression.

(1) uses html_entity_decode() and htmlentities():

$s = html_entity_decode($mytext);
$sub = substr($s, 0, 6);
echo htmlentities($sub);

(2) might be something like:

if (preg_match('!^([^&]|&(?:.*?;)){0,5}!s', $mytext, $match)) {
  echo $match[0];
}

What this is saying is: find me up to 5 occurrences of the preceding expression from the beginning of the string. The preceding expression is either:

any character that isn't an ampersand; or
an ampersand, followed by anything up to and including a semi-colon (ie an HTML entity).

This isn't perfect so I would favour (1).

score 3 · Accepted Answer


function encoded_substr($string, $param, $param2){
  $s = html_entity_decode($string);
  $sub = substr($s, $param, $param2);
  return htmlentities($sub);
}

そこに、cletus のコードをコピーして関数に貼り付けました。これで、1 行のコードで非常に単純な 3 行の関数を呼び出すことができます。これが「クリーン」でない場合、「クリーン」の意味がわかりません。

score 3 · Accepted Answer

を使用する場合、一部の文字は提案されたデコード + エンコードを壊すことに注意してくださいsubstr()。

例

$string=html_entity_decode("Workin&#8217; on my Fitness&#8230;In the Backyard.");

echo $string;
echo substr($string,0,25);
echo htmlentities(substr($string,0,25));

出力します:

私のフィットネスに取り組んでいます…裏庭で。
フィットネスに取り組んでいます</li>
(空文字列)

ソリューション

を使用しmb_substr()ます。

echo mb_substr($string,0,25);
echo htmlentities(mb_substr($string,0,25));

出力します:

私のフィットネスに取り組んでいます…で
’私のフィットネス…に取り組む

score 1 · Accepted Answer

以下のコーディング機能を試してみてください。

<?php

$mytext="that&#039;s really &quot;confusing&quot; and &lt;absolutly&gt; silly";

echo limit_text($tamil_var,6);

function limit_text($text,$limit){
   preg_match_all("/&(.*)\;/U", $text, $pat_array);
   $additional=0;

   foreach ($pat_array[0] as $key => $value) {
     if($key <$limit){$additional += (strlen($value)-1);}
   }
   $limit+=$additional;

   if(strlen($text)>$limit){
     $text = substr( $text,0,$limit );
     $text = substr( $text,0,-(strlen(strrchr($text,' '))) );
   }
   return $text;

}

?>

score 0 · Accepted Answer

まあ、きれいな方法は 1 つだけです: エンティティをまったく使用しないことです。
部分文字列を使用する理由は 1 つではありません。出力のみに使用できます。
したがって、最初に substr、次にエンコードします。

php - HTMLエンティティを適切にサブストラクションする方法は？

6 に答える 6

例

出力します:

ソリューション

出力します:

Related

Reference