c# - StreamReader.Readline() は、ファイル内の行をカウントする最速の方法ですか?

Question

しばらく見回していると、ファイル内の行数を把握する方法についてかなりの数の議論が見つかりました。

たとえば、次の 3 つです。
c#テキストファイル内の行数をカウントする方法
 テキストファイル内の行数を確認する行数
 をすばやくカウントするには?

それで、私は先に進み、私が見つけることができる最も効率的な（少なくともメモリに関しては？）と思われる方法を使用することになりました：

private static int countFileLines(string filePath)
{
    using (StreamReader r = new StreamReader(filePath))
    {
        int i = 0;
        while (r.ReadLine() != null) 
        { 
            i++; 
        }
        return i;
    }
}

しかし、ファイルの行自体が非常に長い場合、これには永遠に時間がかかります。これに対するより速い解決策は本当にありませんか？

StreamReader.Read()orを使用しようとしましたがStreamReader.Peek()、「もの」(chars? text?) があるとすぐにどちらかを次の行に移動させることはできません (または方法がわかりません)。

アイデアはありますか？

結論/結果（提供された回答に基づいていくつかのテストを実行した後）：

以下の 5 つの方法を 2 つの異なるファイルでテストしたところ、一貫した結果が得られました。これは、プレーンオールドStreamReader.ReadLine()が依然として最速の方法の 1 つであることを示しているようです。

ファイル #1:
サイズ: 3,631 KB
行数: 56,870

ファイル #1 の秒単位の結果:
0.02 --> ReadLine メソッド。
0.04 --> 読み取り方法。
0.29 --> ReadByte メソッド。
0.25 --> Readlines.Count メソッド。
0.04 --> ReadWithBufferSize メソッド。

ファイル #2:
サイズ: 14,499 KB
行数: 213,424

ファイル #1 の秒単位の結果:
0.08 --> ReadLine メソッド。
0.19 --> 読み取り方法。
1.15 --> ReadByte メソッド。
1.02 --> Readlines.Count メソッド。
0.08 --> ReadWithBufferSize メソッド。

以下は、私が受け取ったすべてのフィードバックに基づいてテストした 5 つの方法です。

private static int countWithReadLine(string filePath)
{
    using (StreamReader r = new StreamReader(filePath))
    {
    int i = 0;
    while (r.ReadLine() != null)
    {
        i++;
    }
    return i;
    }
}

private static int countWithRead(string filePath)
{
    using (StreamReader _reader = new StreamReader(filePath))
    {
    int c = 0, count = 0;
    while ((c = _reader.Read()) != -1)
    {
        if (c == 10)
        {
        count++;
        }
    }
    return count;
    }            
}

private static int countWithReadByte(string filePath)
{
    using (Stream s = new FileStream(filePath, FileMode.Open))
    {
    int i = 0;
    int b;

    b = s.ReadByte();
    while (b >= 0)
    {
        if (b == 10)
        {
        i++;
        }
        b = s.ReadByte();
    }
    return i;
    }
}

private static int countWithReadLinesCount(string filePath)
{
    return File.ReadLines(filePath).Count();
}

private static int countWithReadAndBufferSize(string filePath)
{
    int bufferSize = 512;

    using (Stream s = new FileStream(filePath, FileMode.Open))
    {
    int i = 0;
    byte[] b = new byte[bufferSize];
    int n = 0;

    n = s.Read(b, 0, bufferSize);
    while (n > 0)
    {
        i += countByteLines(b, n);
        n = s.Read(b, 0, bufferSize);
    }
    return i;
    }
}

private static int countByteLines(byte[] b, int n)
{
    int i = 0;
    for (int j = 0; j < n; j++)
    {
    if (b[j] == 10)
    {
        i++;
    }
    }

    return i;
}

score 9 · Accepted Answer

いいえそうではありません。ポイントは、必要のない文字列を具体化することです。

カウントするには、「文字列」部分を無視して「行」部分に進む方がはるかに優れています。

LINE は、\r\n (13, 10 - CR LF) または別のマーカーで終わる一連のバイトです。

バッファリングされたストリームでバイトに沿って実行し、行末マーカーの出現回数を数えます。

score 5 · Accepted Answer

これをすばやく実行する方法を知る最善の方法は、C/C++ を使用せずに実行する最速の方法を考えることです。

アセンブリには、キャラクターのメモリをスキャンする CPU レベルの操作があるため、アセンブリでは次のようにします。

ファイルの大部分 (またはすべて) をメモリに読み込む
SCASB コマンドを実行する
必要に応じて繰り返す

そのため、C# では、コンパイラを可能な限りそれに近づけたいと考えています。

score 4 · Accepted Answer

複数の方法を試し、それらのパフォーマンスをテストしました。

1 バイトを読み取る方法は、他の方法よりも約 50% 遅くなります。他のメソッドはすべてほぼ同じ時間で戻ります。スレッドを作成し、これを非同期で実行してみると、読み取りを待っている間に、以前の読み取りの処理を開始できます。それは私には頭痛のように聞こえます。

私は 1 つのライナーを使用します。File.ReadLines(filePath).Count();これは、私がテストした他の方法と同様に機能します。

        private static int countFileLines(string filePath)
        {
            using (StreamReader r = new StreamReader(filePath))
            {
                int i = 0;
                while (r.ReadLine() != null)
                {
                    i++;
                }
                return i;
            }
        }

        private static int countFileLines2(string filePath)
        {
            using (Stream s = new FileStream(filePath, FileMode.Open))
            {
                int i = 0;
                int b;

                b = s.ReadByte();
                while (b >= 0)
                {
                    if (b == 10)
                    {
                        i++;
                    }
                    b = s.ReadByte();
                }
                return i + 1;
            }
        }

        private static int countFileLines3(string filePath)
        {
            using (Stream s = new FileStream(filePath, FileMode.Open))
            {
                int i = 0;
                byte[] b = new byte[bufferSize];
                int n = 0;

                n = s.Read(b, 0, bufferSize);
                while (n > 0)
                {
                    i += countByteLines(b, n);
                    n = s.Read(b, 0, bufferSize);
                }
                return i + 1;
            }
        }

        private static int countByteLines(byte[] b, int n)
        {
            int i = 0;
            for (int j = 0; j < n; j++)
            {
                if (b[j] == 10)
                {
                    i++;
                }
            }

            return i;
        }

        private static int countFileLines4(string filePath)
        {
            return File.ReadLines(filePath).Count();
        }

score 3 · Accepted Answer

はい、そのような行を読むことは、実用的な意味で最も速くて簡単な方法です。

ここにショートカットはありません。ファイルは行ベースではないため、ファイルから1バイトごとに読み取って、行数を判別する必要があります。

TomTomが指摘したように、行を数えるために文字列を作成する必要は厳密にはありませんが、費やされる時間の大部分は、データがディスクから読み取られるのを待つことになります。はるかに複雑なアルゴリズムを作成すると、実行時間の1％が削減され、コードの作成とテストにかかる時間が大幅に増加します。

score 3 · Accepted Answer

public static int CountLines(Stream stm)
{
    StreamReader _reader = new StreamReader(stm);
    int c = 0, count = 0;
    while ((c = _reader.Read()) != -1)
    {
        if (c == '\n')
        {
            count++;
        }
    }
    return count;
}

c# - StreamReader.Readline() は、ファイル内の行をカウントする最速の方法ですか?

7 に答える 7

Related

Reference