c# - 文字数からHTML文字を引いたものC＃

Question

文字列内の文字数をカウントし、文字列を切り捨ててから返す方法を見つけようとしています。ただし、HTMLタグをカウントしないためにこの関数が必要です。問題は、HTMLタグをカウントする場合、切り捨てポイントがタグの中央にあると、ページが壊れて表示されることです。

これは私がこれまでに持っているものです...

public string Truncate(string input, int characterLimit, string currID) {
    string output = input;

    // Check if the string is longer than the allowed amount
    // otherwise do nothing
    if (output.Length > characterLimit && characterLimit > 0) {

        // cut the string down to the maximum number of characters
        output = output.Substring(0, characterLimit);

        // Check if the character right after the truncate point was a space
        // if not, we are in the middle of a word and need to remove the rest of it
        if (input.Substring(output.Length, 1) != " ") {
            int LastSpace = output.LastIndexOf(" ");

            // if we found a space then, cut back to that space
            if (LastSpace != -1)
            {
                output = output.Substring(0, LastSpace);
            }
        }
        // end any anchors
        if (output.Contains("<a href")) {
            output += "</a>";
        }
        // Finally, add the "..." and end the paragraph
        output += "<br /><br />...<a href='Announcements.aspx?ID=" + currID + "'>see more</a></p>";
    }
    return output;
}

しかし、私はこれに満足していません。これを行うためのより良い方法はありますか？これに対する新しい解決策、またはおそらく私がこれまでに持っているものに何を追加するかについての提案を提供できれば、それは素晴らしいことです。

免責事項：私はC＃を使用したことがないため、言語に関連する概念に精通していません...選択ではなく、実行する必要があるため、これを実行しています。

ありがとう、Hristo

score 3 · Accepted Answer

問題に適したツールを使用してください。

HTMLは、解析するための単純な形式ではありません。独自のパーサーを使用するのではなく、実績のある既存のパーサーを使用することをお勧めします。XHTMLのみを解析することがわかっている場合は、代わりにXMLパーサーを使用できます。

これらは、セマンティック表現を保持するHTMLで操作を実行するための唯一の信頼できる方法です。

正規表現を使用しようとしないでください。HTMLは正規言語ではなく、その方向に向かって悲しみと悲惨さを引き起こすだけです。

c# - 文字数からHTML文字を引いたものC＃

1 に答える 1

Related

Reference