c++ - 数百万または数十億の STL ベクトルをソートおよび連結する最速の方法

Question

100 万または 10 億の STL ベクトルを並べ替えて連結し、単一の STL ベクトルにする最良の方法は何ですか。現在、私が行っている方法は、ベクトルを反復処理して各操作を実行することです。

ここに疑似コードがあります

typedef unsigned long long int ULLInt;

ULLInt N = 1000000;

vector<vector<ULLInt> > vecVec( N, vector<ULLInt>() );
vector<ULLInt>          concatVec;

// ...
// ... fill vectors inside vecVec here 
// ..  we also get here the total number of values inserted in all vectors (count)
// ...

// reserve the space
concatVec.reserve( count);

// sort each vector and concatenate them in sequence
for( ULLInt i=0; i<N; i++)
  sort( vecVec[i].begin(), vecVec[i].end() );
  concatVec.insert( concatVec.end(), vecVec[i].begin(), vecVec[i].end() );
end for

concatVec をソートする必要はないことに注意してください。提案をありがとう。

score 3 · Accepted Answer

私がすることの 1 つは、百万の std::vector を連結する必要があるかどうかを尋ねることです。各ベクトルをリストに追加し、各ベクトルの各要素を走査する独自の反復子を作成したらどうなるでしょうか? ほとんどのアルゴリズムでは、これは 1 つの巨大なベクトルと見分けがつきません。また、負荷によっては、カスタムイテレータで実行される余分な作業は、すべてのベクトルを実際に連結するために必要なすべての作業よりもはるかに少なくなります。

score 1 · Accepted Answer

vecVec のベクトルが昇順で埋められている場合 (チャットから理解しているように、これがユースケースです)、多数のスモールではなく 1 つのベクトルを使用して、各ベクトルの開始インデックスを個別のインデックス配列に維持することができます。これにより、サブベクトルを適切に「構築」することで、コストのかかる連結を回避できます。

#include <vector>
#include <algorithm>
#include <cstdlib>
#include <iterator>

int main(int argc,char *argv[])
{
    using namespace std;
    typedef int Payload;
    vector<Payload> vec;
    vector<size_t> indices;
    for(unsigned i=0;i!=100;++i)
    {
        indices.push_back(vec.size()); // begin index of current vector
        // filling current sub-vector
        generate_n(back_inserter(vec),777+i,rand);
    }
    indices.push_back(vec.size()); // end of last vector, or begin of
                                   // one-past last vector

    // sorting each sub vector
    for(size_t i=0;i+1!=indices.size();++i)
    {
        // can be done in parallel. be aware of possible false sharing
        sort(vec.begin()+indices[i],vec.begin()+indices[i+1]);
    }
    return 0;
}

score 1 · Accepted Answer

コードがベクトルの 1 つの内容を挿入するたびに、ターゲットベクトルに結果を保持するのに十分なメモリがあることを確認する必要があります。これは、ターゲットベクターのメモリを頻繁に再割り当てすることを意味します。つまり、その内容をコピーすることを意味し、コードはそれを何度も何度も行うことになります。ターゲットベクターのメモリを最終的なフルサイズに事前に割り当てると、はるかに高速になります。についてお読みくださいvector::reserve()。

score 1 · Accepted Answer

これはどう：

ベクトルをコアの山に分割します。各山に必要なサイズを計算する
すべてのデータ用にベクター内のスペースを予約します
このベクトルをコア部分に分割します。
パーツとパイルをスレッドに供給してマージします。

いくつかの簡単なコード（おそらくコンパイルされませんが、要点がわかるかもしれません）：

typedef vector<vector<ULLINT>> ManyVectors; 

void merge(ManyVectors vector_of_vectors) {
  const int cores = 16;
  std::array<ManyVectors, cores> piles = split_vector(vector_of_vectors,cores);
  std::array<size_t, cores> sizes = calculate_sizes(piles,cores);
  std::vector<ULLINT> result;
  result.reserve(sum_of_sizes(sizes));
  int used = 0; 
  int core = 0;
  for (ManyVectors& pile: piles) {
    std::thread(merge_vectors, pile, result.begin()+used);
    used += sizes[core];
    core += 1;  
  }
}

c++ - 数百万または数十億の STL ベクトルをソートおよび連結する最速の方法

4 に答える 4

Related

Reference