c# - c＃余分な空白を削除する最速の方法

Question

余分な空白を1つの空白に置き換える最速の方法は何ですか？
例えば

から

foo      bar

に

foo bar

score 54 · Accepted Answer

最速の方法は？文字列を繰り返し処理し、StringBuilderスペースのグループごとに1つのスペースのみをコピーして、文字ごとに2番目のコピーを作成します。

バリアントを入力しやすいとReplace、余分な文字列のバケットロードが作成されます（または正規表現DFAの構築に時間を浪費します）。

比較結果で編集：

http://ideone.com/NV6EzUをn=50で使用すると（プロセスを強制終了するのに非常に時間がかかったため、ideoneで削減する必要がありました）、次のようになります。

正規表現：7771ms。

Stringbuilder：894ms。

これは確かに予想通りですが、Regexこれほど単純なものにはひどく非効率的です。

score 50 · Accepted Answer

正規表現を使用できます：

static readonly Regex trimmer = new Regex(@"\s\s+");

s = trimmer.Replace(s, " ");

パフォーマンスを向上させるには、を渡しRegexOptions.Compiledます。

score 33 · Accepted Answer

少し遅れましたが、余分な空白を削除する最速の方法を取得するために、いくつかのベンチマークを実行しました。もっと速い答えがあれば、追加したいと思います。

結果：

NormalizeWhiteSpaceForLoop：156ミリ秒（私による-すべての空白の削除に関する私の回答から）
NormalizeWhiteSpace：267ミリ秒（Alex Kによる）
RegexCompiled：1950ミリ秒（SLaksによる）
正規表現：2261ミリ秒（SLaksによる）

コード：

public class RemoveExtraWhitespaces
{
    public static string WithRegex(string text)
    {
        return Regex.Replace(text, @"\s+", " ");
    }

    public static string WithRegexCompiled(Regex compiledRegex, string text)
    {
        return compiledRegex.Replace(text, " ");
    }

    public static string NormalizeWhiteSpace(string input)
    {
        if (string.IsNullOrEmpty(input))
            return string.Empty;

        int current = 0;
        char[] output = new char[input.Length];
        bool skipped = false;

        foreach (char c in input.ToCharArray())
        {
            if (char.IsWhiteSpace(c))
            {
                if (!skipped)
                {
                    if (current > 0)
                        output[current++] = ' ';

                    skipped = true;
                }
            }
            else
            {
                skipped = false;
                output[current++] = c;
            }
        }

        return new string(output, 0, current);
    }

    public static string NormalizeWhiteSpaceForLoop(string input)
    {
        int len = input.Length,
            index = 0,
            i = 0;
        var src = input.ToCharArray();
        bool skip = false;
        char ch;
        for (; i < len; i++)
        {
            ch = src[i];
            switch (ch)
            {
                case '\u0020':
                case '\u00A0':
                case '\u1680':
                case '\u2000':
                case '\u2001':
                case '\u2002':
                case '\u2003':
                case '\u2004':
                case '\u2005':
                case '\u2006':
                case '\u2007':
                case '\u2008':
                case '\u2009':
                case '\u200A':
                case '\u202F':
                case '\u205F':
                case '\u3000':
                case '\u2028':
                case '\u2029':
                case '\u0009':
                case '\u000A':
                case '\u000B':
                case '\u000C':
                case '\u000D':
                case '\u0085':
                    if (skip) continue;
                    src[index++] = ch;
                    skip = true;
                    continue;
                default:
                    skip = false;
                    src[index++] = ch;
                continue;
            }
        }

        return new string(src, 0, index);
    }
}

テスト：

[TestFixture]
public class RemoveExtraWhitespacesTest
{
    private const string _text = "foo                  bar                  foobar                     moo foo                  bar                  foobar                     moo foo                  bar                  foobar                     moo foo                  bar                  foobar                     moo foo                  bar                  foobar                     moo foo                  bar                  foobar                     moo foo                  bar                  foobar                     moo foo                  bar                  foobar                     moo foo                  bar                  foobar                     moo foo                  bar                  foobar                     moo foo                  bar                  foobar                     moo foo                  bar                  foobar                     moo foo                  bar                  foobar                     moo foo                  bar                  foobar                     moo foo                  bar                  foobar                     moo foo                  bar                  foobar                     moo foo                  bar                  foobar                     moo foo                  bar                  foobar                     moo foo                  bar                  foobar                     moo foo                  bar                  foobar                     moo foo                  bar                  foobar                     moo foo                  bar                  foobar                     moo foo                  bar                  foobar                     moo foo                  bar                  foobar                     moo foo                  bar                  foobar                     moo foo                  bar                  foobar                     moo foo                  bar                  foobar                     moo foo                  bar                  foobar                     moo foo                  bar                  foobar                     moo foo                  bar                  foobar                     moo foo                  bar                  foobar                     moo foo                  bar                  foobar                     moo foo                  bar                  foobar                     moo foo                  bar                  foobar                     moo foo                  bar                  foobar                     moo foo                  bar                  foobar                     moo foo                  bar                  foobar                     moo foo                  bar                  foobar                     moo foo                  bar                  foobar                     moo foo                  bar                  foobar                     moo foo                  bar                  foobar                     moo foo                  bar                  foobar                     moo foo                  bar                  foobar                     moo foo                  bar                  foobar                     moo foo                  bar                  foobar                     moo foo                  bar                  foobar                     moo foo                  bar                  foobar                     moo foo                  bar                  foobar                     moo foo                  bar                  foobar                     moo foo                  bar                  foobar                     moo foo                  bar                  foobar                     moo foo                  bar                  foobar                     moo foo                  bar                  foobar                     moo foo                  bar                  foobar                     moo foo                  bar                  foobar                     moo foo                  bar                  foobar                     moo foo                  bar                  foobar                     moo foo                  bar                  foobar                     moo foo                  bar                  foobar                     moo foo                  bar                  foobar                     moo ";
    private const string _expected = "foo bar foobar moo foo bar foobar moo foo bar foobar moo foo bar foobar moo foo bar foobar moo foo bar foobar moo foo bar foobar moo foo bar foobar moo foo bar foobar moo foo bar foobar moo foo bar foobar moo foo bar foobar moo foo bar foobar moo foo bar foobar moo foo bar foobar moo foo bar foobar moo foo bar foobar moo foo bar foobar moo foo bar foobar moo foo bar foobar moo foo bar foobar moo foo bar foobar moo foo bar foobar moo foo bar foobar moo foo bar foobar moo foo bar foobar moo foo bar foobar moo foo bar foobar moo foo bar foobar moo foo bar foobar moo foo bar foobar moo foo bar foobar moo foo bar foobar moo foo bar foobar moo foo bar foobar moo foo bar foobar moo foo bar foobar moo foo bar foobar moo foo bar foobar moo foo bar foobar moo foo bar foobar moo foo bar foobar moo foo bar foobar moo foo bar foobar moo foo bar foobar moo foo bar foobar moo foo bar foobar moo foo bar foobar moo foo bar foobar moo foo bar foobar moo foo bar foobar moo foo bar foobar moo foo bar foobar moo foo bar foobar moo foo bar foobar moo foo bar foobar moo foo bar foobar moo foo bar foobar moo foo bar foobar moo foo bar foobar moo ";

    private const int _iterations = 10000;

    [Test]
    public void Regex()
    {
        var result = TimeAction("Regex", () => RemoveExtraWhitespaces.WithRegex(_text));
        Assert.AreEqual(_expected, result);
    }

    [Test]
    public void RegexCompiled()
    {
        var compiledRegex = new Regex(@"\s+", RegexOptions.Compiled);
        var result = TimeAction("RegexCompiled", () => RemoveExtraWhitespaces.WithRegexCompiled(compiledRegex, _text));
        Assert.AreEqual(_expected, result);
    }

    [Test]
    public void NormalizeWhiteSpace()
    {
        var result = TimeAction("NormalizeWhiteSpace", () => RemoveExtraWhitespaces.NormalizeWhiteSpace(_text));
        Assert.AreEqual(_expected, result);
    }

    [Test]
    public void NormalizeWhiteSpaceForLoop()
    {
        var result = TimeAction("NormalizeWhiteSpaceForLoop", () => RemoveExtraWhitespaces.NormalizeWhiteSpaceForLoop(_text));
        Assert.AreEqual(_expected, result);
    }

    public string TimeAction(string name, Func<string> func)
    {
        var timer = Stopwatch.StartNew();
        string result = string.Empty; ;
        for (int i = 0; i < _iterations; i++)
        {
            result = func();
        }

        timer.Stop();
        Console.WriteLine(string.Format("{0}: {1} ms", name, timer.ElapsedMilliseconds));
        return result;
    }
}

score 12 · Accepted Answer

私は以下のメソッドを使用します-スペースだけでなくすべての空白文字を処理し、先頭と末尾の空白の両方をトリミングし、余分な空白を削除し、すべての空白をスペース文字に置き換えます（したがって、均一なスペース区切り文字があります）。そして、これらのメソッドは高速です。

public static String CompactWhitespaces( String s )
{
    StringBuilder sb = new StringBuilder( s );

    CompactWhitespaces( sb );

    return sb.ToString();
}

public static void CompactWhitespaces( StringBuilder sb )
{
    if( sb.Length == 0 )
        return;

    // set [start] to first not-whitespace char or to sb.Length

    int start = 0;

    while( start < sb.Length )
    {
        if( Char.IsWhiteSpace( sb[ start ] ) )
            start++;
        else 
            break;
    }

    // if [sb] has only whitespaces, then return empty string

    if( start == sb.Length )
    {
        sb.Length = 0;
        return;
    }

    // set [end] to last not-whitespace char

    int end = sb.Length - 1;

    while( end >= 0 )
    {
        if( Char.IsWhiteSpace( sb[ end ] ) )
            end--;
        else 
            break;
    }

    // compact string

    int dest = 0;
    bool previousIsWhitespace = false;

    for( int i = start; i <= end; i++ )
    {
        if( Char.IsWhiteSpace( sb[ i ] ) )
        {
            if( !previousIsWhitespace )
            {
                previousIsWhitespace = true;
                sb[ dest ] = ' ';
                dest++;
            }
        }
        else
        {
            previousIsWhitespace = false;
            sb[ dest ] = sb[ i ];
            dest++;
        }
    }

    sb.Length = dest;
}

score 11 · Accepted Answer

string q = " Hello     how are   you           doing?";
string a = String.Join(" ", q.Split(new string[] { " " }, StringSplitOptions.RemoveEmptyEntries));

score 9 · Accepted Answer

string text = "foo       bar";
text = Regex.Replace(text, @"\s+", " ");
// text = "foo bar"

このソリューションは、スペース、タブ、および改行で機能します。スペースだけが必要な場合は、「\s」を「」に置き換えます。

score 7 · Accepted Answer

私はより大きな文字列のためにこれらの1つを必要とし、以下のルーチンを思いつきました。

連続する空白（タブ、改行を含む）は、にあるものに置き換えられnormalizeToます。先頭/末尾の空白が削除されます。

5k->5milのchar文字列を使用したRegExよりも約8倍高速です。

internal static string NormalizeWhiteSpace(string input, char normalizeTo = ' ')
{
    if (string.IsNullOrEmpty(input))
        return string.Empty;

    int current = 0;
    char[] output = new char[input.Length];
    bool skipped = false;

    foreach (char c in input.ToCharArray())
    {
        if (char.IsWhiteSpace(c))
        {
            if (!skipped)
            {
                if (current > 0)
                    output[current++] = normalizeTo;

                skipped = true;
            }
        }
        else
        {
            skipped = false;
            output[current++] = c;
        }
    }

    return new string(output, 0, skipped ? current - 1 : current);
}

score 6 · Accepted Answer

string yourWord = "beep boop    baap beep   boop    baap             beep";

yourWord = yourWord .Replace("  ", " |").Replace("| ", "").Replace("|", "");

score 5 · Accepted Answer

StringBuilderを使用して次のことを試みました。

余分な空白の部分文字列を削除します
Blindyが示唆するように、元の文字列をループすることから文字を受け入れる

これが私が見つけたパフォーマンスと読みやすさの最良のバランスです（100,000回の反復タイミング実行を使用）。時々、これは読みにくいバージョンより速く、せいぜい5％遅くテストします。私の小さなテスト文字列では、正規表現には4.24倍の時間がかかります。

public static string RemoveExtraWhitespace(string str)
    {
        var sb = new StringBuilder();
        var prevIsWhitespace = false;
        foreach (var ch in str)
        {
            var isWhitespace = char.IsWhiteSpace(ch);
            if (prevIsWhitespace && isWhitespace)
            {
                continue;
            }
            sb.Append(ch);
            prevIsWhitespace = isWhitespace;
        }
        return sb.ToString();
    }

score 3 · Accepted Answer

高速ではありませんが、単純さが役立つ場合、これは機能します。

while (text.Contains("  ")) text=text.Replace("  ", " ");

score 3 · Accepted Answer

このコードはうまく機能します。私はパフォーマンスを測定していません。

string text = "   hello    -  world,  here   we go  !!!    a  bc    ";
string.Join(" ", text.Split().Where(x => x != ""));
// Output
// "hello - world, here we go !!! a bc"

score 3 · Accepted Answer

私は配列を使ってみましたが、はありませんif。

結果

PS C:\dev\Spaces> dotnet run -c release
// .NETCoreApp,Version=v3.0
Seed=7, n=20, s.Length=2828670
Regex by SLaks            1407ms, len=996757
StringBuilder by Blindy    154ms, len=996757
Array                      130ms, len=996757
NoIf                        91ms, len=996757
All match!

メソッド

private static string WithNoIf(string s)
{
    var dst = new char[s.Length];
    uint end = 0;
    char prev = char.MinValue;
    for (int k = 0; k < s.Length; ++k)
    {
        var c = s[k];
        dst[end] = c;

        // We'll move forward if the current character is not ' ' or if prev char is not ' '
        // To avoid 'if' let's get diffs for c and prev and then use bitwise operatios to get 
        // 0 if n is 0 or 1 if n is non-zero
        uint x = (uint)(' ' - c) + (uint)(' ' - prev); // non zero if any non-zero

        end += ((x | (~x + 1)) >> 31) & 1; // https://stackoverflow.com/questions/3912112/check-if-a-number-is-non-zero-using-bitwise-operators-in-c by ruslik
        prev = c;
    }
    return new string(dst, 0, (int)end);
}

private static string WithArray(string s)
{
    var dst = new char[s.Length];
    int end = 0;
    char prev = char.MinValue;
    for (int k = 0; k < s.Length; ++k)
    {
        char c = s[k];
        if (c != ' ' || prev != ' ') dst[end++] = c;
        prev = c;
    }
    return new string(dst, 0, end);
}

テストコード

public static void Main()
{
    const int n = 20;
    const int seed = 7;
    string s = GetTestString(seed);

    var fs = new (string Name, Func<string, string> Func)[]{
        ("Regex by SLaks", WithRegex),
        ("StringBuilder by Blindy", WithSb),
        ("Array", WithArray),
        ("NoIf", WithNoIf),
    };

    Console.WriteLine($"Seed={seed}, n={n}, s.Length={s.Length}");
    var d = new Dictionary<string, string>(); // method, result
    var sw = new Stopwatch();
    foreach (var f in fs)
    {
        sw.Restart();
        var r = "";
        for( int i = 0; i < n; i++) r = f.Func(s);
        sw.Stop();
        d[f.Name] = r;
        Console.WriteLine($"{f.Name,-25} {sw.ElapsedMilliseconds,4}ms, len={r.Length}");
    }
    Console.WriteLine(d.Values.All( v => v == d.Values.First()) ? "All match!" : "Not all match! BAD");
}

private static string GetTestString(int seed)
{
    // by blindy from https://stackoverflow.com/questions/6442421/c-sharp-fastest-way-to-remove-extra-white-spaces
    var rng = new Random(seed);
    // random 1mb+ string (it's slow enough...)
    StringBuilder ssb = new StringBuilder(1 * 1024 * 1024);
    for (int i = 0; i < 1 * 1024 * 1024; ++i)
        if (rng.Next(5) == 0)
            ssb.Append(new string(' ', rng.Next(20)));
        else
            ssb.Append((char)(rng.Next(128 - 32) + 32));
    string s = ssb.ToString();
    return s;
}

score 2 · Accepted Answer

2

これを試して：

System.Text.RegularExpressions.Regex.Replace(input, @"\s+", " ");

于 2011-06-22T15:30:01.250 に答える

score 2 · Accepted Answer

この質問では、いくつかの要件が明確ではなく、いくつかの検討に値します。

先頭または末尾に空白文字を1つ入れますか？
すべての空白を1つの文字に置き換える場合、その文字に一貫性を持たせますか？（つまり、これらのソリューションの多くは、\ t \tを\tに置き換え、''を''に置き換えます。

これは非常に効率的なバージョンであり、すべての空白を1つのスペースに置き換え、forループの前に先頭と末尾の空白を削除します。

  public static string WhiteSpaceToSingleSpaces(string input)
  {
    if (input.Length < 2) 
        return input;

    StringBuilder sb = new StringBuilder();

    input = input.Trim();
    char lastChar = input[0];
    bool lastCharWhiteSpace = false;

    for (int i = 1; i < input.Length; i++)
    {
        bool whiteSpace = char.IsWhiteSpace(input[i]);

        //Skip duplicate whitespace characters
        if (whiteSpace && lastCharWhiteSpace)
            continue;

        //Replace all whitespace with a single space.
        if (whiteSpace)
            sb.Append(' ');
        else
            sb.Append(input[i]);

        //Keep track of the last character's whitespace status
        lastCharWhiteSpace = whiteSpace;
    }

    return sb.ToString();
  }

score 2 · Accepted Answer

それが最速の方法かどうかはわかりませんが、私はこれを使用し、これは私のために働いています：

    /// <summary>
    /// Remove all extra spaces and tabs between words in the specified string!
    /// </summary>
    /// <param name="str">The specified string.</param>
    public static string RemoveExtraSpaces(string str)
    {
        str = str.Trim();
        StringBuilder sb = new StringBuilder();
        bool space = false;
        foreach (char c in str)
        {
            if (char.IsWhiteSpace(c) || c == (char)9) { space = true; }
            else { if (space) { sb.Append(' '); }; sb.Append(c); space = false; };
        }
        return sb.ToString();
    }

score 1 · Accepted Answer

これは面白いですが、私のPCでは、以下の方法はSergey PovalyaevのStringBulderアプローチと同じくらい高速です-（1000回の繰り返しで約282ms、10kのsrc文字列）。ただし、メモリ使用量についてはわかりません。

string RemoveExtraWhiteSpace(string src, char[] wsChars){
   return string.Join(" ",src.Split(wsChars, StringSplitOptions.RemoveEmptyEntries));
}

明らかに、スペースだけでなく、どの文字でも問題なく動作します。

これはOPが要求したものではありませんが、文字列内の特定の連続する文字を1つのインスタンスのみに置き換えることが本当に必要な場合は、この比較的効率的な方法を使用できます。

    string RemoveDuplicateChars(string src, char[] dupes){  
        var sd = (char[])dupes.Clone();  
        Array.Sort(sd);

        var res = new StringBuilder(src.Length);

        for(int i = 0; i<src.Length; i++){
            if( i==0 || src[i]!=src[i-1] || Array.BinarySearch(sd,src[i])<0){
                res.Append(src[i]); 
            }
        }
        return res.ToString();
    }

score 1 · Accepted Answer

public string GetCorrectString(string IncorrectString)
    {
        string[] strarray = IncorrectString.Split(' ');
        var sb = new StringBuilder();
        foreach (var str in strarray)
        {
            if (str != string.Empty)
            {
                sb.Append(str).Append(' ');
            }
        }
        return sb.ToString().Trim();
    }

score 1 · Accepted Answer

私はこれを泡立てただけですが、まだテストしていません。しかし、これはエレガントであり、正規表現を避けていると感じました。

    /// <summary>
    /// Removes extra white space.
    /// </summary>
    /// <param name="s">
    /// The string
    /// </param>
    /// <returns>
    /// The string, with only single white-space groupings. 
    /// </returns>
    public static string RemoveExtraWhiteSpace(this string s)
    {
        if (s.Length == 0)
        {
            return string.Empty;
        }

        var stringBuilder = new StringBuilder();
        var whiteSpaceCount = 0;
        foreach (var character in s)
        {
            if (char.IsWhiteSpace(character))
            {
                whiteSpaceCount++;
            }
            else
            {
                whiteSpaceCount = 0;
            }

            if (whiteSpaceCount > 1)
            {
                continue;
            }

            stringBuilder.Append(character);
        }

        return stringBuilder.ToString();
    }

score 1 · Accepted Answer

ここで何かが足りませんか？私はこれを思いついた：

// Input: "HELLO     BEAUTIFUL       WORLD!"
private string NormalizeWhitespace(string inputStr)
{
    // First split the string on the spaces but exclude the spaces themselves
    // Using the input string the length of the array will be 3. If the spaces
    // were not filtered out they would be included in the array
    var splitParts = inputStr.Split(' ').Where(x => x != "").ToArray();

   // Now iterate over the parts in the array and add them to the return
   // string. If the current part is not the last part, add a space after.
   for (int i = 0; i < splitParts.Count(); i++)
   {
        retVal += splitParts[i];
        if (i != splitParts.Count() - 1)
        {
            retVal += " ";
        }
   }
    return retVal;
}
// Would return "HELLO BEAUTIFUL WORLD!"

ここで2番目の文字列を作成して返し、splitParts配列を作成していることはわかっています。これは非常に簡単だと考えました。多分私は潜在的なシナリオのいくつかを考慮に入れていません。

score 1 · Accepted Answer

これは本当に古いことですが、空白を圧縮する（繰り返し発生する空白文字を単一の「スペース」文字に置き換える）最も簡単な方法は次のとおりです。

    public static string CompactWhitespace(string astring)
    {
        if (!string.IsNullOrEmpty(astring))
        {
            bool found = false;
            StringBuilder buff = new StringBuilder();

            foreach (char chr in astring.Trim())
            {
                if (char.IsWhiteSpace(chr))
                {
                    if (found)
                    {
                        continue;
                    }

                    found = true;
                    buff.Append(' ');
                }
                else
                {
                    if (found)
                    {
                        found = false;
                    }

                    buff.Append(chr);
                }
            }

            return buff.ToString();
        }

        return string.Empty;
    }

score 1 · Accepted Answer

私はC＃にあまり詳しくないので、私のコードはエレガントで最も効率的なコードではありません。私は自分のユースケースに合った答えを見つけるためにここに来ましたが、それを見つけることができませんでした（または私はそれを理解できませんでした）。

私のユースケースでは、次の条件ですべてのホワイトスペース（WS：{、、}）をspace正規化する必要がありました。tabcr lf

WSは任意の組み合わせで提供できます
WSのシーケンスを最も重要なWSに置き換えます
tab場合によっては保持する必要があります（たとえば、タブで区切られたファイル。その場合は、繰り返されるタブも保持する必要があります）。しかし、ほとんどの場合、それらはスペースに変換する必要があります。

したがって、ここにサンプル入力と期待される出力があります（免責事項：私のコードはこの例でのみテストされています）



        Every night    in my            dreams  I see you, I feel you
    That's how    I know you go on

Far across the  distance and            places between us   



You            have                 come                    to show you go on

に変換されます

Every night in my dreams I see you, I feel you
That's how I know you go on
Far across the distance and places between us
You have come to show you go on

これが私のコードです

using System;
using System.Text.RegularExpressions;

public class Program
{
    public static void Main(string text)
    {
        bool preserveTabs = false;

        //[Step 1]: Clean up white spaces around the text
        text = text.Trim();
        //Console.Write("\nTrim\n======\n" + text);

        //[Step 2]: Reduce repeated spaces to single space. 
        text = Regex.Replace(text, @" +", " ");
        // Console.Write("\nNo repeated spaces\n======\n" + text);

        //[Step 3]: Hande Tab spaces. Tabs needs to treated with care because 
        //in some files tabs have special meaning (for eg Tab seperated files)
        if(preserveTabs)
        {
            text = Regex.Replace(text, @" *\t *", "\t");
        }
        else
        {
            text = Regex.Replace(text, @"[ \t]+", " ");
        }
        //Console.Write("\nTabs preserved\n======\n" + text);

        //[Step 4]: Reduce repeated new lines (and other white spaces around them)
                  //into a single new line.
        text = Regex.Replace(text, @"([\t ]*(\n)+[\t ]*)+", "\n");
        Console.Write("\nClean New Lines\n======\n" + text);    
    }
}

ここで実際のこのコードを参照してください：https ：//dotnetfiddle.net/eupjIU

score 0 · Accepted Answer

indexOfを使用して、最初に空白シーケンスが開始する場所を取得し、次にreplaceメソッドを使用して空白を""に変更できます。そこから、取得したインデックスを使用して、その場所に1つの空白文字を配置できます。

score 0 · Accepted Answer

public static string RemoveExtraSpaces(string input)
{
    input = input.Trim();
    string output = "";
    bool WasLastCharSpace = false;
    for (int i = 0; i < input.Length; i++)
    {
        if (input[i] == ' ' && WasLastCharSpace)
            continue;
        WasLastCharSpace = input[i] == ' ';
        output += input[i];
    }
    return output;
}

score 0 · Accepted Answer

コピーして続行したいだけの人のために：

    private string RemoveExcessiveWhitespace(string value)
    {
        if (value == null) { return null; }

        var builder = new StringBuilder();
        var ignoreWhitespace = false;
        foreach (var c in value)
        {
            if (!ignoreWhitespace || c != ' ')
            {
                builder.Append(c);
            }
            ignoreWhitespace = c == ' ';
        }
        return builder.ToString();
    }

score 0 · Accepted Answer

famos algoを調整すると（この場合は「類似した」文字列を比較するために）、大文字と小文字は区別されず、マルチスペースを気にせず、NULLにも耐えることができます。ベンチマークを信用しないでください-これはデータ比較の集中的なタスク、約に入れられました。1 / 4GBのデータとスピードアップは、アクション全体で約100％（コメント部分とこのアルゴ5 / 10分）です。ここでのこれらのいくつかは、約30％の違いがありませんでした。ビルドに最適なアルゴリズムは、逆アセンブルに進み、リリースビルドとデバッグビルドの両方でコンパイラが何をするかを確認する必要があることを教えてくれます。ここでも、同様の（Cの質問）に対する回答として、完全なトリムが半分単純になっていますが、大文字と小文字は区別されます。

public static bool Differs(string srcA, string srcB)
{
    //return string.Join(" ", (a?.ToString()??String.Empty).ToUpperInvariant().Split(new char[0], StringSplitOptions.RemoveEmptyEntries).ToList().Select(x => x.Trim()))
    //    != string.Join(" ", (b?.ToString()??String.Empty).ToUpperInvariant().Split(new char[0], StringSplitOptions.RemoveEmptyEntries).ToList().Select(x => x.Trim()));

    if (srcA == null) { if (srcB == null) return false; else srcA = String.Empty; } // A == null + B == null same or change A to empty string
    if (srcB == null) { if (srcA == null) return false; else srcB = String.Empty; }
    int dstIdxA = srcA.Length, dstIdxB = srcB.Length; // are there any remaining (front) chars in a string ?
    int planSpaceA = 0, planSpaceB = 0; // state automaton 1 after non-WS, 2 after WS
    bool validA, validB; // are there any remaining (front) chars in a array ?
    char chA = '\0', chB = '\0';

spaceLoopA:
        if (validA = (dstIdxA > 0)) {
            chA = srcA[--dstIdxA];
            switch (chA) {
                case '!': case '"': case '#': case '$': case '%': case '&': case '\'': case '(': case ')': case '*': case '+': case ',': case '-':
                case '.': case '/': case '0': case '1': case '2': case '3': case '4': case '5': case '6': case '7': case '8': case '9': case ':':
                case ';': case '<': case '=': case '>': case '?': case '@': case 'A': case 'B': case 'C': case 'D': case 'E': case 'F': case 'G':
                case 'H': case 'I': case 'J': case 'K': case 'L': case 'M': case 'N': case 'O': case 'P': case 'Q': case 'R': case 'S': case 'T':
                case 'U': case 'V': case 'W': case 'X': case 'Y': case 'Z': case '[': case '\\': case ']': case '^': case '_': case '`': // a-z will be | 32 to Upper
                case '{': case '|': case '}': case '~':
                    break; // ASCII except lowercase
                case 'a': case 'b': case 'c': case 'd': case 'e': case 'f': case 'g': case 'h': case 'i':
                case 'j': case 'k': case 'l': case 'm': case 'n': case 'o': case 'p': case 'q': case 'r':
                case 's': case 't': case 'u': case 'v': case 'w': case 'x': case 'y': case 'z':
                    chA = (Char)(chA & ~0x20);
                    break;
                case '\u0020': case '\u00A0': case '\u1680': case '\u2000': case '\u2001':
                case '\u2002': case '\u2003': case '\u2004': case '\u2005': case '\u2006':
                case '\u2007': case '\u2008': case '\u2009': case '\u200A': case '\u202F':
                case '\u205F': case '\u3000': case '\u2028': case '\u2029': case '\u0009':
                case '\u000A': case '\u000B': case '\u000C': case '\u000D': case '\u0085':
                    if (planSpaceA == 1) planSpaceA = 2; // cycle here to address multiple WS before non-WS part
                    goto spaceLoopA;
                default:
                    chA = Char.ToUpper(chA);
                    break;
        }}
spaceLoopB:
        if (validB = (dstIdxB > 0)) { // 2nd string / same logic
            chB = srcB[--dstIdxB];
            switch (chB) {
                case '!': case '"': case '#': case '$': case '%': case '&': case '\'': case '(': case ')': case '*': case '+': case ',': case '-':
                case '.': case '/': case '0': case '1': case '2': case '3': case '4': case '5': case '6': case '7': case '8': case '9': case ':':
                case ';': case '<': case '=': case '>': case '?': case '@': case 'A': case 'B': case 'C': case 'D': case 'E': case 'F': case 'G':
                case 'H': case 'I': case 'J': case 'K': case 'L': case 'M': case 'N': case 'O': case 'P': case 'Q': case 'R': case 'S': case 'T':
                case 'U': case 'V': case 'W': case 'X': case 'Y': case 'Z': case '[': case '\\': case ']': case '^': case '_': case '`': // a-z will be | 32 to Upper
                    break;
                case '{': case '|': case '}': case '~':
                    break; // ASCII except lowercase
                case 'a': case 'b': case 'c': case 'd': case 'e': case 'f': case 'g': case 'h': case 'i':
                case 'j': case 'k': case 'l': case 'm': case 'n': case 'o': case 'p': case 'q': case 'r':
                case 's': case 't': case 'u': case 'v': case 'w': case 'x': case 'y': case 'z':
                    chB = (Char)(chB & ~0x20);
                    break;
                case '\u0020': case '\u00A0': case '\u1680': case '\u2000': case '\u2001':
                case '\u2002': case '\u2003': case '\u2004': case '\u2005': case '\u2006':
                case '\u2007': case '\u2008': case '\u2009': case '\u200A': case '\u202F':
                case '\u205F': case '\u3000': case '\u2028': case '\u2029': case '\u0009':
                case '\u000A': case '\u000B': case '\u000C': case '\u000D': case '\u0085':
                    if (planSpaceB == 1) planSpaceB = 2;
                goto spaceLoopB;
                default:
                    chB = Char.ToUpper(chB);
                    break;
        }}
        if (planSpaceA != planSpaceB) return true; // both should/not have space now (0 init / 1 last non-WS / 2 last was WS)
        if (validA) { // some (non-WS) in A still
            if (validB) {
            if (chA != chB) return true; // both have another char to compare, are they different ?
            } else return true; // not in B not - they are different
        } else { // A done, current last pair equal => continue 2 never ending loop till B end (by WS only to be same)
            if (!validB) return false; // done and end-up here without leaving by difference => both are same except some WSs arround
            else return true; // A done, but non-WS remains in B - different
        }  // A done, B had no non-WS or non + WS last follow - never ending loop continue
        planSpaceA = 1; planSpaceB = 1;
        goto spaceLoopA; // performs better
    }
}

score 0 · Accepted Answer

私のバージョン（Stianの回答から改善）。非常に高速である必要があります。

public static string TrimAllExtraWhiteSpaces(this string input)
{
    if (string.IsNullOrEmpty(input))
    {
        return input;
    }

    var current = 0;
    char[] output = new char[input.Length];
    var charArray = input.ToCharArray();

    for (var i = 0; i < charArray.Length; i++)
    {
        if (!char.IsWhiteSpace(charArray[i]))
        {
            if (current > 0 && i > 0 && char.IsWhiteSpace(charArray[i - 1]))
            {
                output[current++] = ' ';
            }
            output[current++] = charArray[i];
        }
    }

    return new string(output, 0, current);
}

score -1 · Accepted Answer

複雑なコードは必要ありません！重複を削除する簡単なコードは次のとおりです。

public static String RemoveCharOccurence(String s, char[] remove)
{
    String s1 = s;
    foreach(char c in remove)
    {
        s1 = RemoveCharOccurence(s1, c);
    }

    return s1;
}

public static String RemoveCharOccurence(String s, char remove)
{
    StringBuilder sb = new StringBuilder(s.Length);

    Boolean removeNextIfMatch = false;
    foreach(char c in s)
    {
        if(c == remove)
        {
            if(removeNextIfMatch)
                continue;
            else
                removeNextIfMatch = true;
        }
        else
            removeNextIfMatch = false;

        sb.Append(c);
    }

    return sb.ToString();
}

score -1 · Accepted Answer

非常に簡単です。次の.Replace()方法を使用してください。

string words = "Hello     world!";
words = words.Replace("\\s+", " ");

出力>>>「Helloworld！」

score -2 · Accepted Answer

私が考えることができる最も簡単な方法：

Text = Text.Replace("\<Space>\<Space>", "\<Space>").Replace("\<Space>\<Space>", "\<Space>");
// Replace 2 \<Space>s with 1 space, twice

c# - c＃余分な空白を削除する最速の方法

29 に答える 29

結果

メソッド

テストコード

Related

Reference