c++ - ファイルをメモリに読み取り、データをループしてからファイルに書き込む

Question

この投稿と同様の質問をしようとしています: C: read binary file to memory, alter buffer, write buffer to file しかし、答えは役に立ちませんでした (私は c++ が初めてなので、すべてを理解できませんでした)それの）

ループでメモリ内のデータにアクセスし、行ごとに処理して、別の形式でファイルに書き込むにはどうすればよいですか?

これは私が持っているものです:

#include <fstream>
#include <iostream>
#include <string>
#include <sstream>
#include <vector>
#include <stdio.h>
#include <sys/types.h>
#include <sys/stat.h>
#include <unistd.h>
#include <stdlib.h>

using namespace std;

int main()
{
    char* buffer;
    char linearray[250];
    int lineposition;
    double filesize;
    string linedata;
    string a;

    //obtain the file
    FILE *inputfile;
    inputfile = fopen("S050508-v3.txt", "r");

    //find the filesize
    fseek(inputfile, 0, SEEK_END);
    filesize = ftell(inputfile);
    rewind(inputfile);

    //load the file into memory
    buffer = (char*) malloc (sizeof(char)*filesize);      //allocate mem
    fread (buffer,filesize,1,inputfile);         //read the file to the memory
    fclose(inputfile);

    //Check to see if file is correct in Memory
    cout.write(buffer,filesize);

    free(buffer);
}

どんな助けにも感謝します！

編集 (データの詳細):

私のデータは、5 ～ 10 GB の異なるファイルです。約 3 億行のデータがあります。各行は次のようになります

M359

T359 3520 359

M400

A3592 zng 392

最初の要素は文字で、残りの項目は数字または文字です。行を読み取って処理してから書き込むよりも、行ごとにループする方がはるかに高速になるため、これをメモリに読み取ろうとしています。私は64ビットLinuxでコンパイルしています。さらに明確にする必要がある場合はお知らせください。再びありがとう。

編集 2 switch ステートメントを使用して各行を処理しています。各行の最初の文字によって、行の残りの部分をフォーマットする方法が決まります。たとえば、「M」はミリ秒を意味し、次の 3 つの数値を構造体に入れます。各行には、異なる処理を行う必要がある異なる最初の文字があります。

score 3 · Accepted Answer

So pardon the potentially blatantly obvious, but if you want to process this line by line, then...

#include <iostream>
#include <fstream>
#include <string>
using namespace std;

int main(int argc, char *argv[])
{
    // read lines one at a time
    ifstream inf("S050508-v3.txt");
    string line;
    while (getline(inf, line))
    {
        // ... process line ...
    }
    inf.close();

    return 0;
}

And just fill in the body of the while loop? Maybe I'm not seeing the real problem (a forest for the trees kinda thing).

EDIT

The OP is inline with using a custom streambuf which may not necessarily be the most portable thing in the world, but he's more interested in avoiding flipping back and forh between input and output files. With enough RAM, this should do the trick.

#include <iostream>
#include <fstream>
#include <iterator>
#include <memory>
using namespace std;

struct membuf : public std::streambuf
{
    membuf(size_t len)
        : streambuf()
        , len(len)
        , src(new char[ len ] )
    { 
        setg(src.get(), src.get(), src.get() + len);
    }

    // direct buffer access for file load.
    char * get() { return src.get(); };
    size_t size() const { return len; };

private:
    std::unique_ptr<char> src;
    size_t len;
};

int main(int argc, char *argv[])
{
    // open file in binary, retrieve length-by-end-seek
    ifstream inf(argv[1], ios::in|ios::binary);
    inf.seekg(0,inf.end);
    size_t len = inf.tellg();
    inf.seekg(0, inf.beg);

    // allocate a steam buffer with an internal block
    //  large enough to hold the entire file.
    membuf mb(len+1);

    // use our membuf buffer for our file read-op.
    inf.read(mb.get(), len);
    mb.get()[len] = 0;

    // use iss for your nefarious purposes
    std::istream iss(&mb);
    std::string s;
    while (iss >> s)
        cout << s << endl;

    return EXIT_SUCCESS;
}

score 0 · Accepted Answer

If I had to do this, I'd probably use code something like this:

std::ifstream in("S050508-v3.txt");

std::istringstream buffer;

buffer << in.rdbuf();

std::string data = buffer.str();

if (check_for_good_data(data))
    std::cout << data;

This assumes you really need the entire contents of the input file in memory at once to determine whether it should be copied to output or not. If (for example) you can look at the data one byte at a time, and determine whether that byte should be copied without looking at the others, you could do something more like:

std::ifstream in(...);

std::copy_if(std::istreambuf_iterator<char>(in),
             std::istreambuf_iterator<char>(),
             std::ostream_iterator<char>(std::cout, ""),
             is_good_char);

...where is_good_char is a function that returns a bool saying whether that char should be included in the output or not.

Edit: the size of files you're dealing with mostly rules out the first possibility I've given above. You're also correct that reading and writing large chunks of data will almost certainly improve speed over working on one line at a time.

score 0 · Accepted Answer

fgets と scanf を調べる必要があります。一致するデータの断片を引き出すことができるため、操作しやすくなります。このようなものは次のようになります。

FILE *input = fopen("file.txt", "r");
FILE *output = fopen("out.txt","w");

int bufferSize = 64;
char buffer[bufferSize];

while(fgets(buffer,bufferSize,input) != EOF){
   char data[16];
   sscanf(buffer,"regex",data);
   //manipulate data
   fprintf(output,"%s",data);
}
fclose(output);
fclose(input);

それはよりCの方法です。C++は、istreamを使用して物事をもう少し雄弁に処理します： http://www.cplusplus.com/reference/istream/istream/

c++ - ファイルをメモリに読み取り、データをループしてからファイルに書き込む

3 に答える 3

Related

Reference