c++ - std::thread とグッドプラクティスを使用してループを並列化する

Question

重複の可能性:
C++ 2011 : std::thread : ループを並列化する簡単な例?

ベクトルの要素に計算を分散する次のプログラムを考えてみましょう (以前は std::thread を使用したことがありません)。

// vectorop.cpp
// compilation: g++ -O3 -std=c++0x vectorop.cpp -o vectorop -lpthread
// execution: time ./vectorop 100 50000000 
// (100: number of threads, 50000000: vector size)
#include <iostream>
#include <iomanip>
#include <cstdio>
#include <vector>
#include <thread>
#include <cmath>
#include <algorithm>
#include <numeric>

// Some calculation that takes some time
template<typename T> 
void f(std::vector<T>& v, unsigned int first, unsigned int last) {
    for (unsigned int i = first; i < last; ++i) {
        v[i] = std::sin(v[i])+std::exp(std::cos(v[i]))/std::exp(std::sin(v[i])); 
    }
}

// Main
int main(int argc, char* argv[]) {

    // Variables
    const int nthreads = (argc > 1) ? std::atol(argv[1]) : (1);
    const int n = (argc > 2) ? std::atol(argv[2]) : (100000000);
    double x = 0;
    std::vector<std::thread> t;
    std::vector<double> v(n);

    // Initialization
    std::iota(v.begin(), v.end(), 0);

    // Start threads
    for (unsigned int i = 0; i < n; i += std::max(1, n/nthreads)) {
        // question 1: 
        // how to compute the first/last indexes attributed to each thread 
        // with a more "elegant" formula ?
        std::cout<<i<<" "<<std::min(i+std::max(1, n/nthreads), v.size())<<std::endl;
        t.push_back(std::thread(f<double>, std::ref(v), i, std::min(i+std::max(1, n/nthreads), v.size())));
    }

    // Finish threads
    for (unsigned int i = 0; i < t.size(); ++i) {
        t[i].join();
    }
    // question 2: 
    // how to be sure that all threads are finished here ?
    // how to "wait" for the end of all threads ?

    // Finalization
    for (unsigned int i = 0; i < n; ++i) {
        x += v[i];
    }
    std::cout<<std::setprecision(15)<<x<<std::endl;
    return 0;
}

コードにはすでに 2 つの質問が埋め込まれています。

3 つ目は、次のようなものです: このコードはまったく問題ありませんか、それとも std::threads を使用してより洗練された方法で記述できますか? std::thread を使用した「グッドプラクティス」がわかりません...

score 0 · Accepted Answer

最初の質問では、各スレッドで計算する範囲をどのように計算するか: コードを読みやすくするために、定数を抽出して名前を付けました。良い慣例として、コードを簡単に変更できるラムダfも使用しました。ラムダのコードはここでのみ使用されますが、プログラム全体で他のコードから関数を使用できます。これを利用して、コードの共有部分を関数に配置し、ラムダで一度だけ使用される特殊化します。

const size_t itemsPerThread = std::max(1, n/threads);
for (size_t nextIndex= 0; nextIndex< v.size(); nextIndex+= itemsPerThread)
{
    const size_t beginIndex = nextIndex;
    const size_t endIndex =std::min(nextIndex+itemsPerThread, v.size())
    std::cout << beginIndex << " " << endIndex << std::endl;
    t.push_back(std::thread([&v,beginIndex ,endItem]{f(v,beginIndex,endIndex);});
}

高度な使用例ではスレッドプールを利用しますが、これがどのように見えるかはアプリケーションの設計に依存し、STL ではカバーされていません。スレッドモデルの良い例については、Qt Frameworkを参照してください。スレッドを使い始めたばかりの場合は、これを保存しておいてください。

2番目の質問は、コメントですでに回答されています。関数はstd::thread::join、スレッドが終了するまで待機 (ブロック) します。各スレッドで結合関数を呼び出し、結合関数の後のコードに到達することで、そこにあるすべてのスレッドが終了し、削除できることを確認できます。

c++ - std::thread とグッド プラクティスを使用してループを並列化する

1 に答える 1

Related

Reference

c++ - std::thread とグッドプラクティスを使用してループを並列化する