c# - UTF-8文字列からの制御文字の削除

Question

この質問を見つけましたが、すべての有効な文字も削除されます（有効な文字と制御文字utf-8があるのに、空白の文字列が返されます）。utf-8について読んだようにutf-8、の特定の範囲はなく、control characters各文字セットには独自のがありcontrol charactersます。

上記のソリューションを変更して削除のみにするにはどうすればよいcontrol charactersですか？

score 23 · Accepted Answer

This is how I roll:

Regex.Replace(evilWeirdoText, @"[\u0000-\u001F]", string.Empty)

This strips out all the first 31 control characters. The next hex value up from \u001F is \u0020 AKA the space. Everything before space is all the line feed and null nonsense.

To believe me on the characters: http://donsnotes.com/tech/charsets/ascii.html

score 22 · Accepted Answer

私は次のコードがあなたのために働くと思います：

public static string RemoveControlCharacters(string inString)
{
    if (inString == null) return null;
    StringBuilder newString = new StringBuilder();
    char ch;
    for (int i = 0; i < inString.Length; i++)
    {
        ch = inString[i];
        if (!char.IsControl(ch))
        {
            newString.Append(ch);
        }
    }
    return newString.ToString();
}

score 0 · Accepted Answer

If you plan to use the string as a query string, you should consider using the Uri.EscapeUriString() or Uri.EscapeDataString() before sending it out. Note: You might still need to pull out anything from char.IsControl() first?

c# - UTF-8文字列からの制御文字の削除

3 に答える 3

Related

Reference