c - OpenMP カスタムリダクション変数

Question

私は、リダクション節を使用せずにリダクション変数のアイデアを実装するように割り当てられました。この基本的なコードをセットアップしてテストしました。

int i = 0;
int n = 100000000;
double sum = 0.0;
double val = 0.0;
for (int i = 0; i < n; ++i)
{
    val += 1;
}
sum += val;

最後にsum == n。

各スレッドは val をプライベート変数として設定する必要があり、sum への加算は、スレッドが収束するクリティカルセクションにする必要があります。

int i = 0;
int n = 100000000;
double sum = 0.0;
double val = 0.0;
#pragma omp parallel for private(i, val) shared(n) num_threads(nthreads)
for (int i = 0; i < n; ++i)
{
    val += 1;
}
#pragma omp critical
{
    sum += val;
}

クリティカルセクションの val のプライベートインスタンスを維持する方法がわかりません。全体をより大きなプラグマで囲んでみました。

int i = 0;
int n = 100000000;
double sum = 0.0;
double val = 0.0;
#pragma omp parallel private(val) shared(sum)
{
#pragma omp parallel for private(i) shared(n) num_threads(nthreads)
    for (int i = 0; i < n; ++i)
    {
        val += 1;
    }
#pragma omp critical
    {
        sum += val;
    }
}

しかし、私は正しい答えを得ません。これを行うには、プラグマと句をどのように設定すればよいですか?

score 6 · Accepted Answer

あなたのプログラムにはかなり多くの欠陥があります。各プログラムを見てみましょう（欠点はコメントとして書かれています）。

プログラム 1

int i = 0;
int n = 100000000;
double sum = 0.0;
double val = 0.0;
#pragma omp parallel for private(i, val) shared(n) num_threads(nthreads)
for (int i = 0; i < n; ++i)
{
    val += 1;
}
// At end of this, all the openmp threads die. 
// The reason is the "pragma omp parallel" creates threads, 
// and the scope of those threads were till the end of that for loop. So, the thread dies
// So, there is only one thread (i.e. the main thread) that will enter the critical section
#pragma omp critical
{
    sum += val;
}

プログラム 2

int i = 0;
int n = 100000000;
double sum = 0.0;
double val = 0.0;
#pragma omp parallel private(val) shared(sum)
 // pragma omp parallel creates the threads
{
#pragma omp parallel for private(i) shared(n) num_threads(nthreads)
  // There is no need to create another set of threads
  // Note that "pragma omp parallel" always creates threads.
  // Now you have created nested threads which is wrong
    for (int i = 0; i < n; ++i)
    {
        val += 1;
    }
#pragma omp critical
    {
        sum += val;
    }
}

最善の解決策は

int n = 100000000;
double sum = 0.0;
int nThreads = 5;
#pragma omp parallel shared(sum, n) num_threads(nThreads) // Create omp threads, and always declare the shared and private variables here.
// Also declare the maximum number of threads.
// Do note that num_threads(nThreads) doesn't guarantees that the number of omp threads created is nThreads. It just says that maximum number of threads that can be created is nThreads... 
// num_threads actually limits the number of threads that can be created
{
    double val = 0.0;  // val can be declared as local variable (for each thread) 
#pragma omp for nowait       // now pragma for  (here you don't need to create threads, that's why no "omp parallel" )
    // nowait specifies that the threads don't need to wait (for other threads to complete) after for loop, the threads can go ahead and execute the critical section 
    for (int i = 0; i < n; ++i)
    {
        val += 1;
    }
#pragma omp critical
    {
        sum += val;
    }
}

score 2 · Accepted Answer

OpenMP で共有変数を明示的に指定する必要はありません。外部スコープの変数はデフォルトで常に共有されるためです (default(none)節が指定されていない場合)。private変数には未定義の初期値があるため、累積ループの前にプライベートコピーをゼロにする必要があります。ループカウンターは自動的に認識され、非公開になります。そのように明示的に宣言する必要はありません。また、値を更新するだけatomicなので、完全なクリティカルセクションよりも軽量な構造体を使用する必要があります。

int i = 0;
int n = 100000000;
double sum = 0.0;
double val = 0.0;
#pragma omp parallel private(val)
{
    val = 0.0;
    #pragma omp for num_threads(nthreads)
    for (int i = 0; i < n; ++i)
    {
        val += 1;
    }
    #pragma omp atomic update
    sum += val;
}

このupdate句は OpenMP 3.1 のコンストラクトに追加されたatomicため、コンパイラが以前の OpenMP バージョンに準拠している場合 (たとえば、VS2012 でも OpenMP 2.0 のみをサポートする MSVC++ を使用している場合)、update句を削除する必要があります。並列ループの外側では使用されないためval、veda の回答のように内側のスコープで宣言すると、自動的にプライベート変数になります。

parallel forは、2 つの OpenMP コンストラクトをネストするためのショートカットであることに注意してください:parallelおよびfor:

#pragma omp parallel for sharing_clauses scheduling_clauses
for (...) {
}

次と同等です。

#pragma omp parallel sharing_clauses
#pragma omp for scheduling_clauses
for (...) {
}

これは、他の 2 つの組み合わせ構造にも当てはまります: parallel sectionsand parallel workshare(Fortran のみ)

c - OpenMP カスタム リダクション変数

2 に答える 2

Related

Reference

c - OpenMP カスタムリダクション変数