c# - おそらくnullで終わるASCIIバイト[]を文字列に変換する最速の方法は?

Question

(おそらく) null で終了する ascii バイトの配列を C# の文字列に変換する必要があります。これを行うために見つけた最速の方法は、以下に示す UnsafeAsciiBytesToString メソッドを使用することです。このメソッドは、注釈に警告を含む String.String(sbyte*) コンストラクターを使用します。

「値パラメーターは、既定の ANSI コードページ (つまり、Encoding.Default で指定されたエンコード方法) を使用してエンコードされた文字列を表す配列を指すと見なされます。

注: * デフォルトの ANSI コードページはシステムに依存するため、同じ符号付きバイト配列からこのコンストラクターによって作成される文字列は、システムによって異なる場合があります。* ...

* 指定された配列が null で終了しない場合、このコンストラクターの動作はシステムに依存します。たとえば、このような状況はアクセス違反を引き起こす可能性があります。* "

今、文字列のエンコード方法が変わることはないと確信していますが、アプリが実行されているシステムのデフォルトのコードページは変わる可能性があります。では、この目的で String.String(sbyte*) を使用して悲鳴を上げるべきではない理由はありますか?

using System;
using System.Text;

namespace FastAsciiBytesToString
{
    static class StringEx
    {
        public static string AsciiBytesToString(this byte[] buffer, int offset, int maxLength)
        {
            int maxIndex = offset + maxLength;

            for( int i = offset; i < maxIndex; i++ )
            {
                /// Skip non-nulls.
                if( buffer[i] != 0 ) continue;
                /// First null we find, return the string.
                return Encoding.ASCII.GetString(buffer, offset, i - offset);
            }
            /// Terminating null not found. Convert the entire section from offset to maxLength.
            return Encoding.ASCII.GetString(buffer, offset, maxLength);
        }

        public static string UnsafeAsciiBytesToString(this byte[] buffer, int offset)
        {
            string result = null;

            unsafe
            {
                fixed( byte* pAscii = &buffer[offset] )
                { 
                    result = new String((sbyte*)pAscii);
                }
            }

            return result;
        }
    }

    class Program
    {
        static void Main(string[] args)
        {
            byte[] asciiBytes = new byte[]{ 0, 0, 0, (byte)'a', (byte)'b', (byte)'c', 0, 0, 0 };

            string result = asciiBytes.AsciiBytesToString(3, 6);

            Console.WriteLine("AsciiBytesToString Result: \"{0}\"", result);

            result = asciiBytes.UnsafeAsciiBytesToString(3);

            Console.WriteLine("UnsafeAsciiBytesToString Result: \"{0}\"", result);

            /// Non-null terminated test.
            asciiBytes = new byte[]{ 0, 0, 0, (byte)'a', (byte)'b', (byte)'c' };

            result = asciiBytes.UnsafeAsciiBytesToString(3);

            Console.WriteLine("UnsafeAsciiBytesToString Result: \"{0}\"", result);

            Console.ReadLine();
        }
    }
}

score 15 · Accepted Answer

String(sbyte*, int, int)コンストラクタを使用しない理由はありますか? バッファのどの部分が必要かがわかれば、あとは簡単です。

public static string UnsafeAsciiBytesToString(byte[] buffer, int offset, int length)
{
    unsafe
    {
       fixed (byte* pAscii = buffer)
       { 
           return new String((sbyte*)pAscii, offset, length);
       }
    }
}

最初に確認する必要がある場合:

public static string UnsafeAsciiBytesToString(byte[] buffer, int offset)
{
    int end = offset;
    while (end < buffer.Length && buffer[end] != 0)
    {
        end++;
    }
    unsafe
    {
       fixed (byte* pAscii = buffer)
       { 
           return new String((sbyte*)pAscii, offset, end - offset);
       }
    }
}

これが本当に ASCII 文字列である場合 (つまり、すべてのバイトが 128 未満である場合)、ASCII に基づいていない特に奇妙なデフォルトコードページを取得していない限り、コードページの問題は問題になりません。

興味深いことに、実際にアプリケーションのプロファイリングを行って、これが本当にボトルネックであることを確認しましたか? より読みやすい変換 (たとえば、適切なエンコーディングに Encoding.GetString を使用する) ではなく、絶対に最速の変換が必要ですか?

score 7 · Accepted Answer

using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;

namespace TestProject1
{
    class Class1
    {
    static public string cstr_to_string( byte[] data, int code_page)
    {
        Encoding Enc = Encoding.GetEncoding(code_page);  
        int inx = Array.FindIndex(data, 0, (x) => x == 0);//search for 0
        if (inx >= 0)
          return (Enc.GetString(data, 0, inx));
        else 
          return (Enc.GetString(data)); 
    }

    }
}

score 7 · Accepted Answer

速度はわかりませんが、エンコードする前に LINQ を使用してヌルを削除するのが最も簡単であることがわかりました。

string s = myEncoding.GetString(bytes.TakeWhile(b => !b.Equals(0)).ToArray());

score 4 · Accepted Answer

4

s = s.Substring(0, s.IndexOf((char) 0));

于 2012-09-25T14:09:37.353 に答える

score 1 · Accepted Answer

考慮すべき 1 つの可能性: デフォルトのコードページが許容可能であることを確認し、その情報を使用して実行時に変換メカニズムを選択します。

これは、文字列が実際に null で終了しているかどうかも考慮に入れることができますが、それを行うと、もちろん、速度が低下します。

score 0 · Accepted Answer

.NET クラス System.Text.Encoding を使用して、byte[] オブジェクトを同等の ASCII を含む文字列に、またはその逆に変換する簡単/安全/高速な方法このクラスには、ASCII エンコーダーを返す静的関数があります。

文字列からバイト[]:

string s = "Hello World!"
byte[] b = System.Text.Encoding.ASCII.GetBytes(s);

byte[] から文字列へ:

byte[] byteArray = new byte[] {0x41, 0x42, 0x09, 0x00, 0x255};
string s = System.Text.Encoding.ASCII.GetString(byteArray);

score -2 · Accepted Answer

これは少し醜いですが、安全でないコードを使用する必要はありません:

string result = "";
for (int i = 0; i < data.Length && data[i] != 0; i++)
   result += (char)data[i];

c# - おそらくnullで終わるASCIIバイト[]を文字列に変換する最速の方法は?

9 に答える 9

Related

Reference