php - PHP 正規表現と preg_replace の問題

Question

私は他の誰かの古いコードを調べていて、それを理解するのに苦労していました.

彼は持っています：

explode(' ', strtolower(preg_replace('/[^a-z0-9-]+/i', ' ', preg_replace('/\&#?[a-z0-9]{2,4}\;/', ' ', preg_replace('/<[^>]+>/', ' ', $texts)))));

最初の正規表現は and を除外すると思いますがa-z、0-92番目の正規表現が何をするのかわかりません。3 番目のものは、'< >'except内のすべてに一致します。'>'

結果は変数内のすべての単語を含む配列を出力し$textsますが、コードがこれをどのように生成するのかわかりません。他の機能が何preg_replaceをするのかは理解していますが、プロセスがどのように機能するかはわかりません

score 4 · Accepted Answer

式は、 azと0-9を除く/[^a-z0-9-]+/iすべての文字に一致します（その後、空のスペースに置き換えられます）。in [は、そこに含まれる文字セットを無効にすることを意味します。^^...]

[^a-z0-9]英数字以外の文字と一致します
+上記の1つ以上を意味します
/i大文字と小文字を区別せずに一致させます

式/\&#?[a-z0-9]{2,4}\;/は、&その後にオプションで#、2〜4個の文字と数字が続き、最後に次のようなHTMLエンティティ;に一致します。  '

&#?&いずれかに一致するか&#、?前述#のオプションを作成するため、&実際にはエスケープする必要はありません。
[a-z0-9]{2,4}2〜4文字の英数字に一致
;文字通りのセミコロンです。実際にエスケープする必要はありません。

ご想像のとおり、最後のタグは、または<tagname>のようなタグを空のスペースに置き換えます。の内部コンテンツだけでなく、タグ全体と一致することに注意してください。<tagname attr='value'></tagname><>

<リテラル文字です
[^>]+次までのすべての文字ですが、次の文字は含まれません>
>リテラル文字です

preg_replace()これをネストするのではなく、への3つの別々の呼び出しとして書き直すことを強くお勧めします。

// Strips tags.  
// Would be better done with strip_tags()!!
$texts = preg_replace('/<[^>]+>/', ' ', $texts);
// Removes HTML entities
$texts = preg_replace('/&#?[a-z0-9]{2,4};/', ' ', $texts);
// Removes remainin non-alphanumerics
$texts = preg_replace('/[^a-z0-9-]+/i', ' ', $texts);
$array = explode(' ', $texts);

score 2 · Accepted Answer

このコードはそのように見えます...

HTML / XMLタグを削除します（<と>の間のすべて）
次に、＆または＆＃で始まり、2〜4文字の長さの文字（英数字）
次に、英数字またはダッシュ以外のものをすべて削除します

ネスティングの処理順序で

/<[^>]+>/

Match the character “&lt;” literally «<»
Match any character that is NOT a “&gt;” «[^>]+»
   Between one and unlimited times, as many times as possible, giving back as needed (greedy) «+»
Match the character “&gt;” literally «>»


/\&#?[a-z0-9]{2,4}\;/

Match the character “&amp;” literally «\&»
Match the character “#” literally «#?»
   Between zero and one times, as many times as possible, giving back as needed (greedy) «?»
Match a single character present in the list below «[a-z0-9]{2,4}»
   Between 2 and 4 times, as many times as possible, giving back as needed (greedy) «{2,4}»
   A character in the range between “a” and “z” «a-z»
   A character in the range between “0” and “9” «0-9»
Match the character “;” literally «\;»


/[^a-z0-9-]+/i

Options: case insensitive

Match a single character NOT present in the list below «[^a-z0-9-]+»
   Between one and unlimited times, as many times as possible, giving back as needed (greedy) «+»
   A character in the range between “a” and “z” «a-z»
   A character in the range between “0” and “9” «0-9»
   The character “-” «-»

php - PHP 正規表現と preg_replace の問題

2 に答える 2

Related

Reference