c++ - C++ 大きなバイナリファイルを高速に書き込む方法はありますか?

Question

ゴール

私の目標は、大きなバイナリ文字列 (1 と 0 のみを含む文字列)からファイルをすばやく作成することです。

単刀直入に

目標を達成できる機能が必要です。よくわからない場合は、読み進めてください。

例

Test.exe is running...
.
Inputted binary string:
        1111111110101010
Writing to: c:\users\admin\desktop\Test.txt
        Done!
File(Test.txt) In Byte(s):
        0xFF, 0xAA
.
Test.exe executed successfully!

説明

最初に、Test.exe はユーザーにバイナリ文字列の入力を要求しました。
次に、入力されたバイナリ文字列を 16 進数に変換しました。
最後に、変換された値を Test.txt というファイルに書き込みました。

私はもう試した

私の目標を達成するための失敗した試みとして、私はこの単純な (そしておそらく恐ろしい) 関数を作成しました (ねえ、少なくとも私は試しました):

void BinaryStrToFile( __in const char* Destination,
                      __in std::string &BinaryStr )
{
    std::ofstream OutputFile( Destination, std::ofstream::binary );

    for( ::UINT Index1 = 0, Dec = 0;
         // 8-Bit binary.
         Index1 != BinaryStr.length( )/8;

         // Get the next set of binary value.
         // Write the decimal value as unsigned char to file.
         // Reset decimal value to 0.
         ++ Index1, OutputFile << ( ::BYTE )Dec, Dec = 0 )
    {
        // Convert the 8-bit binary to hexadecimal using the
        // positional notation method - this is how its done:
        // http://www.wikihow.com/Convert-from-Binary-to-Decimal
        for( ::UINT Index2 = 7, Inc = 1; Index2 + 1 != 0; -- Index2, Inc += Inc )
            if( BinaryStr.substr( Index1 * 8, 8 )[ Index2 ] == '1' ) Dec += Inc;
    }
    OutputFile.close( );
};

使用例

#include "Global.h"

void BinaryStrToFile( __in const char* Destination,
                      __in std::string &BinaryStr );

int main( void )
{
    std::string Bin = "";

    // Create a binary string that is a size of 9.53674 mb
    // Note: The creation of this string will take awhile.
    // However, I only start to calculate the speed of writing
    // and converting after it is done generating the string.
    // This string is just created for an example.
    std::cout << "Generating...\n";
    while( Bin.length( ) != 80000000 )
        Bin += "10101010";

    std::cout << "Writing...\n";
    BinaryStrToFile( "c:\\users\\admin\\desktop\\Test.txt", Bin );

    std::cout << "Done!\n";
#ifdef IS_DEBUGGING
    std::cout << "Paused...\n";
    ::getchar( );
#endif

    return( 0 );
};

問題

繰り返しますが、それは私の目標を達成するための私の失敗した試みでした. 問題は速度です。遅すぎる。7分以上かかりました。大きなバイナリ文字列からファイルをすばやく作成する方法はありますか?

前もって感謝します、

C学習者

score 4 · Accepted Answer

substr内側のループで呼び出しを削除することをお勧めします。新しい文字列を割り当ててから、処理する文字ごとにそれを破棄しています。次のコードを置き換えます。

for(::UINT Index2 = 7, Inc = 1; Index2 + 1 != 0; -- Index2, Inc += Inc )
    if( BinaryStr.substr( Index1 * 8, 8 )[ Index2 ] == '1' )
        Dec += Inc;

次のような方法で:

for(::UINT Index2 = 7, Inc = 1; Index2 + 1 != 0; -- Index2, Inc += Inc )
    if( BinaryStr[Index1 * 8 + Index2 ] == '1' )
        Dec += Inc;

score 3 · Accepted Answer

あなたの時間の大部分はここで費やされます：

   for( ::UINT Index2 = 7, Inc = 1; Index2 + 1 != 0; -- Index2, Inc += Inc )
        if( BinaryStr.substr( Index1 * 8, 8 )[ Index2 ] == '1' ) Dec += Inc;

コメントアウトすると、ファイルは数秒で書き込まれます。変換を微調整する必要があると思います。

score 2 · Accepted Answer

出発点として、次のようなものを検討すると思います。

#include <bitset>
#include <fstream>
#include <algorithm>

int main() { 
    std::ifstream in("junk.txt", std::ios::binary | std::ios::in);
    std::ofstream out("junk.bin", std::ios::binary | std::ios::out);

    std::transform(std::istream_iterator<std::bitset<8> >(in),
                   std::istream_iterator<std::bitset<8> >(),
                   std::ostream_iterator<unsigned char>(out),
                   [](std::bitset<8> const &b) { return b.to_ulong();});
    return 0;
}

簡単なテストを行うと、これは私のマシンで約 6 秒で 8000 万バイトの入力ファイルを処理します。あなたのファイルがあなたの質問で言及したものよりもはるかに大きい場合を除き、これは十分な速度であり、シンプルさは打ち負かすのが難しいと思います.

score 1 · Accepted Answer

したがって、s間を行ったり来たりする代わりにstd::string、高速アクセスのためにマシンワードサイズの整数の束を使用してみませんか？

const size_t bufsz = 1000000;

uint32_t *buf = new uint32_t[bufsz];
memset(buf, 0xFA, sizeof(*buf) * bufsz);
std::ofstream ofile("foo.bin", std::ofstream::binary);

int i;
for (i = 0; i < bufsz; i++) {
    ofile << hex << setw(8) << setfill('0') << buf[i];
    // or if you want raw binary data instead of formatted hex:
    ofile.write(reinterpret_cast<char *>(&buf[i]), sizeof(buf[i]));
}

delete[] buf;

私にとって、これはほんの一瞬で実行されます。

score 1 · Accepted Answer

これとまったく違うわけではないものは、大幅に高速になるはずです。

void
text_to_binary_file(const std::string& text, const char *fname)
{
    unsigned char wbuf[4096];  // 4k is a good size of "chunk to write to file"
    unsigned int i = 0, j = 0;
    std::filebuf fp;           // dropping down to filebufs may well be faster
                               // for this problem
    fp.open(fname, std::ios::out|std::ios::trunc);
    memset(wbuf, 0, 4096);

    for (std::string::iterator p = text.begin(); p != text.end(); p++) {
        wbuf[i] |= (1u << (CHAR_BIT - (j+1)));
        j++;
        if (j == CHAR_BIT) {
            j = 0;
            i++;
        }
        if (i == 4096) {
            if (fp.sputn(wbuf, 4096) != 4096)
                abort();
            memset(wbuf, 0, 4096);
            i = 0;
            j = 0;
        }
    }
    if (fp.sputn(wbuf, i+1) != i+1)
        abort();
    fp.close();
}

適切なエラー処理は演習として残しました。

score 1 · Accepted Answer

遅くなりましたが、そのような文字列を処理するための例を示したいと思います。アーキテクチャ固有の最適化では、並列にビットを「絞り出す」ために、複数のレジスタに整列されていない文字のロードを使用する場合があります。このテストされていないサンプルコードは、文字をチェックせず、アラインメントとエンディアンの要件を回避します。そのバイナリ文字列の文字は、単語やダブルワードなどではなく、最上位ビットが最初にある連続したオクテット (バイト) を表すと想定されます。メモリ内 (およびその文字列内) での特定の表現には、移植性のために特別な処理が必要です。

//THIS CODE HAS NEVER BEEN TESTED! But I hope you get the idea.

//set up an ofstream with a 64KiB buffer
std::vector<char> buffer(65536);
std::ofstream ofs("out.bin", std::ofstream::binary|std::ofstream::out|std::ofstream::trunc);
ofs.rdbuf()->pubsetbuf(&buffer[0],buffer.size());

std::string::size_type bits = Bin.length();
std::string::const_iterator cIt = Bin.begin();

//You may treat cases, where (bits % 8 != 0) as error

//Initialize with the first iteration
uint8_t byte = uint8_t(*cIt++) - uint8_t('0');
byte <<= 1;
for(std::string::size_type i = 1;i < (bits & (~std::string::size_type(0x7)));++i,++cIt)
{
    if(i & 0x7) //bit 7 ... 1
    {
        byte |= uint8_t(*cIt) - uint8_t('0');
        byte <<= 1;
    }
    else //bit 0: write and advance to the the next most significant bit of an octet
    {
        byte |= uint8_t(*cIt) - uint8_t('0');
        ofs.put(byte);

        //advance
        ++i;
        ++cIt;
        byte = uint8_t(*cIt) - uint8_t('0');
        byte <<= 1;
    }
}

ofs.flush();

score -1 · Accepted Answer

これにより、1010101010101 の 76.2 MB (80,000,000 バイト) のファイルが作成されます......

#include <stdio.h>
#include <iostream>
#include <fstream>

using namespace std;

int main( void )
{
    char Bin=0;
    ofstream myfile;
    myfile.open (".\\example.bin", ios::out | ios::app | ios::binary);
    int c=0;
    Bin = 0xAA;
    while( c!= 80000000 ){
        myfile.write(&Bin,1);
        c++;
    }
    myfile.close();
    cout << "Done!\n";
    return( 0 );
};

ここにファイルの最初のバイトがあります

c++ - C++ 大きなバイナリ ファイルを高速に書き込む方法はありますか?

7 に答える 7

Related

Reference

c++ - C++ 大きなバイナリファイルを高速に書き込む方法はありますか?