あなたのプログラムにはかなり多くの欠陥があります。各プログラムを見てみましょう(欠点はコメントとして書かれています)。
プログラム 1
int i = 0;
int n = 100000000;
double sum = 0.0;
double val = 0.0;
#pragma omp parallel for private(i, val) shared(n) num_threads(nthreads)
for (int i = 0; i < n; ++i)
{
val += 1;
}
// At end of this, all the openmp threads die.
// The reason is the "pragma omp parallel" creates threads,
// and the scope of those threads were till the end of that for loop. So, the thread dies
// So, there is only one thread (i.e. the main thread) that will enter the critical section
#pragma omp critical
{
sum += val;
}
プログラム 2
int i = 0;
int n = 100000000;
double sum = 0.0;
double val = 0.0;
#pragma omp parallel private(val) shared(sum)
// pragma omp parallel creates the threads
{
#pragma omp parallel for private(i) shared(n) num_threads(nthreads)
// There is no need to create another set of threads
// Note that "pragma omp parallel" always creates threads.
// Now you have created nested threads which is wrong
for (int i = 0; i < n; ++i)
{
val += 1;
}
#pragma omp critical
{
sum += val;
}
}
最善の解決策は
int n = 100000000;
double sum = 0.0;
int nThreads = 5;
#pragma omp parallel shared(sum, n) num_threads(nThreads) // Create omp threads, and always declare the shared and private variables here.
// Also declare the maximum number of threads.
// Do note that num_threads(nThreads) doesn't guarantees that the number of omp threads created is nThreads. It just says that maximum number of threads that can be created is nThreads...
// num_threads actually limits the number of threads that can be created
{
double val = 0.0; // val can be declared as local variable (for each thread)
#pragma omp for nowait // now pragma for (here you don't need to create threads, that's why no "omp parallel" )
// nowait specifies that the threads don't need to wait (for other threads to complete) after for loop, the threads can go ahead and execute the critical section
for (int i = 0; i < n; ++i)
{
val += 1;
}
#pragma omp critical
{
sum += val;
}
}