php - 2つの正規表現をマージして、文字列内の単語を切り捨てます

Question

文字列を単語全体に切り捨てる次の関数を考え出そうとしています（可能であれば、文字に切り捨てる必要があります）。

function Text_Truncate($string, $limit, $more = '...')
{
    $string = trim(html_entity_decode($string, ENT_QUOTES, 'UTF-8'));

    if (strlen(utf8_decode($string)) > $limit)
    {
        $string = preg_replace('~^(.{1,' . intval($limit) . '})(?:\s.*|$)~su', '$1', $string);

        if (strlen(utf8_decode($string)) > $limit)
        {
            $string = preg_replace('~^(.{' . intval($limit) . '}).*~su', '$1', $string);
        }

        $string .= $more;
    }

    return trim(htmlentities($string, ENT_QUOTES, 'UTF-8', true));
}

ここにいくつかのテストがあります：

// Iñtërnâtiônàlizætiøn and then the quick brown fox... (49 + 3 chars)
echo dyd_Text_Truncate('Iñtërnâtiônàlizætiøn and then the quick brown fox jumped overly the lazy dog and one day the lazy dog humped the poor fox down until she died.', 50, '...');

// Iñtërnâtiônàlizætiøn_and_then_the_quick_brown_fox_...  (50 + 3 chars)
echo dyd_Text_Truncate('Iñtërnâtiônàlizætiøn_and_then_the_quick_brown_fox_jumped_overly_the_lazy_dog and one day the lazy dog humped the poor fox down until she died.', 50, '...');

どちらもそのまま動作しますが、2番目preg_replace()をドロップすると、次のようになります。

Iñtërnâtiônàlizætiøn_and_then_the_quick_brown_fox_jumped_overly_the_lazy_dogそしてある日、怠惰な犬は貧しいキツネを死ぬまでぶつけました。

substr()バイトレベルでしか機能せず、ATMにアクセスできないため、使用できませんmb_substr()。2番目の正規表現を最初の正規表現に結合しようと何度か試みましたが、成功しませんでした。

SMSを助けてください、私はこれにほぼ1時間苦労しています。

編集：申し訳ありませんが、私は40時間起きていて、恥知らずにこれを見逃しました：

$string = preg_replace('~^(.{1,' . intval($limit) . '})(?:\s.*|$)?~su', '$1', $string);

それでも、誰かがより最適化された正規表現（または末尾のスペースを無視する正規表現）を持っている場合は、共有してください：

"Iñtërnâtiônàlizætiøn and then "
"Iñtërnâtiônàlizætiøn_and_then_"

編集2：私はまだ末尾の空白を取り除くことができません、誰かが私を助けることができますか？

編集3：さて、私の編集はどれも実際には機能しませんでした、私はRegexBuddyにだまされていました-私はおそらくこれを別の日に残して、今少し眠るべきです。今日はオフ。

score 3 · Accepted Answer

おそらく、RegExpの悪夢の長い夜の後、私はあなたに幸せな朝を与えることができます：

'~^(.{1,' . intval($limit) . '}(?<=\S)(?=\s)|.{'.intval($limit).'}).*~su'

煮詰める：

^      # Start of String
(       # begin capture group 1
 .{1,x} # match 1 - x characters
 (?<=\S)# lookbehind, match must end with non-whitespace 
 (?=\s) # lookahead, if the next char is whitespace, match
 |      # otherwise test this:
 .{x}   # got to x chars anyway.
)       # end cap group
.*     # match the rest of the string (since you were using replace)

いつでも|$末尾にを追加でき(?=\s)ますが、コードは文字列の長さがより長いことをすでにチェックしているので、$limitその場合は必要だとは思いませんでした。

score 0 · Accepted Answer

0

ワードラップの使用を検討しましたか？（http://us3.php.net/wordwrap）

于 2010-04-22T08:09:49.500 に答える

php - 2つの正規表現をマージして、文字列内の単語を切り捨てます

2 に答える 2

Related

Reference