c++ - C ++-ファイル全体（_ 2つの空白で区切られた単語のリスト_）を読み取りましたが、単語を個別にすばやく取得するにはどうすればよいですか？

Question

私は約12万語のファイルを読んだので、速くやろうとしています。見た：

int x = setvbuf(fp, (char *)NULL, _IOFBF, BSZ);
assert( x == 0 && fp != NULL );

オプションですが、1秒以上（1 mbファイル）かかるので、今私はこの方法を試しました：

fopen_s (&pFile,DICT,"rb");
if (pFile==NULL) {fputs ("File error",stderr); exit (1);}

// obtain file size:
fseek (pFile , 0 , SEEK_END);
lSize = ftell (pFile);
rewind (pFile);

// allocate memory to contain the whole file:
buffer = (char*) malloc (sizeof(char)*lSize);

// copy the file into the buffer:
result = fread (buffer,1,lSize,pFile);

ここから続行するにはどうすればよいですか？bufferは単語のリストを保持しており、それらの単語を使用してマルチマップを作成しているので、できるだけ早く単語を1つずつ取得したいと思います。

ありがとう！

score 1 · Accepted Answer

あなたのコードは本質的にを実装してmmap()います。の利点はmmap()、必要に応じて実際のページをメモリにロードすることです。アプリがそれらを非常に高速に順番に読み取る場合、OSはページを可能な限り高速にマッピングします。

#include <sys/mman.h>
#include <sys/stat.h>
#include <fcntl.h>
#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>

#define handle_error(msg) \
    { perror(msg); exit(EXIT_FAILURE); }

int
main(void)
{
    int fd = open("english-words.10", O_RDONLY);
    if (fd == -1)
        handle_error("open");

    struct stat sb;
    if (fstat(fd, &sb) == -1)
        handle_error("fstat");
    size_t lSize = sb.st_size;

    char* buffer = mmap(NULL, lSize, PROT_READ, MAP_PRIVATE, fd, 0);
    if (buffer == MAP_FAILED)
        handle_error("mmap");

    // insert your mapping to a map here

    munmap(buffer, lSize);

    return 0;
}

私もあなたの/fstat()の代わりに使用することに注意してください。fseekftell

score 0 · Accepted Answer

単語を分離することがボトルネックになることはありません。合理的な実装は、SSDよりも高速です。

score 0 · Accepted Answer

私はそのようにすべての単語を読むでしょう：

#include <vector>
#include <string>
#include <fstream>

using namespace std;  // that's the way I like it... :-)

int main()
{
    vector<string> v;   // all the words
    string word;
    ifstream f("myfile.txt");  // open stream for input

    while (f) {
        f >> word;          // read word
        if (!f) break;
        v.push_back(word);  // push word into vector
    }

    // now v holds all the words in the file, and you can iterate them

    return 0;
}

c++ - C ++-ファイル全体（_ 2つの空白で区切られた単語のリスト_）を読み取りましたが、単語を個別にすばやく取得するにはどうすればよいですか？

3 に答える 3

Related

Reference