parallel-processing - openmp による N-queen の最適化

Question

私はOPENMPを学んでおり、nqueensの問題を解決するために次のコードを書きました。

//Full Code: https://github.com/Shafaet/Codes/blob/master/OPENMP/Parallel%20N-  Queen%20problem.cpp
int n;

int call(int col,int rowmask,int dia1,int dia2)
{
    if(col==n) 
    {
        return 1;

    }
    int row,ans=0;
    for(row=0;row<n;row++)
    {
        if(!(rowmask & (1<<row)) & !(dia1 & (1<<(row+col))) & !(dia2 & (1<<((row+n-1)-col))))
        {           
            ans+=call(col+1,rowmask|1<<row,dia1|(1<<(row+col)), dia2|(1<<((row+n-1)-col)));
        }
    }
    return ans;

}

double parallel()
{
    double st=omp_get_wtime();
    int ans=0;
    int i;
    int rowmask=0,dia1=0,dia2=0;
     #pragma omp parallel for reduction(+:ans) shared(i,rowmask)
    for(i=0;i<n;i++)
    {
        rowmask=0;
        dia1=0,dia2=0;
        int col=0,row=i;
        ans+=call(1,rowmask|1<<row,dia1|(1<<(row+col)), dia2|(1<<((row+n-1)-col)));
    }
    printf("Found %d configuration for n=%d\n",ans,n);
    double en=omp_get_wtime();
    printf("Time taken using openmp %lf\n",en-st);
    return en-st;

}
double serial()
{

    double st=omp_get_wtime();
    int ans=0;
    int i;
    int rowmask=0,dia1=0,dia2=0;
    for(i=0;i<n;i++)
    {
        rowmask=0;
        dia1=0,dia2=0;
        int col=0,row=i;
        ans+=call(1,rowmask|1<<row,dia1|(1<<(row+col)), dia2|(1<<((row+n-1)-col)));
    }
    printf("Found %d configuration for n=%d\n",ans,n);
    double en=omp_get_wtime();
    printf("Time taken without openmp %lf\n",en-st);
    return en-st;

}
int main()
{
    double average=0;
    int count=0;
    for(int i=2;i<=13;i++)
    {
        count++;
        n=i;

        double stime=serial();
        double ptime=parallel();
        printf("OpenMP is %lf times faster for n=%d\n",stime/ptime,n);
        average+=stime/ptime;
        puts("===============");
    }
    printf("On average OpenMP is %lf times faster\n",average/count);
    return 0;

}

並列コードは通常のコードよりも高速ですが、openmp プラグマを使用してさらに最適化するにはどうすればよいでしょうか。パフォーマンスを向上させるために何をすべきか、何をすべきでないかを知りたい。

前もって感謝します。

(並列プログラミングに関係のない最適化を提案しないでください)

score 0 · Accepted Answer

あなたのコードは、古典的なバックトラッキング N-Queens 再帰アルゴリズムを使用しているようです。これは、N-Queens の解決に可能な限り最速ではありませんが、(単純さのために)並列処理の基本を実践するという点で最も鮮やかなものです。つまり、これは非常に単純であるため、基本的な "parallel for" とリダクションを除いて、多くの高度な OpenMP 手段を自然に示すとは思わないでしょう。

しかし、並列処理を学習し、おそらくより明確でより良い学習曲線を探している限り、同じアルゴリズムを使用しますが、教育からより読みやすく鮮明になる傾向がある実装が (多くの可能なものから) 利用可能です。視点：

void setQueen(int queens[], int row, int col) {
//check all previously placed rows for attacks
for(int i=0; i<row; i++) {
   // vertical attacks
   if (queens[i]==col) {
       return;
   }

   // diagonal attacks
   if (abs(queens[i]-col) == (row-i) ) {
      return;
   }
}

// column is ok, set the queen
queens[row]=col;
if(row==size-1) {
#pragma omp atomic
    nrOfSolutions++;  //Placed final queen, found a solution
}
else {
     // try to fill next row
     for(int i=0; i<size; i++) {
         setQueen(queens, row+1, i);
     }
}
}

//Function to find all solutions for nQueens problem on size x size chessboard.
void solve() {
#pragma omp parallel for
    for(int i=0; i<size; i++) {
         // try all positions in first row
         int * queens = new int[size];  //array representing queens placed on a chess board.  Index is row position, value is column.
         setQueen(queens, 0, i);
         delete[](queens);
     }
}

このコードは、インテル® Advisor XEサンプル (C++ と Fortran の両方) の 1 つです。特定のサンプルの並列化の側面は、特定のParallel Programming Bookの第 10 章で非常に詳細に説明されています(実際、特定の章では、N-Queens を使用して、シリアルコードを一般的に並列化するためのツールの使用方法を示しています)。

与えられた Advisor n-queens サンプルは、基本的にあなたのものと同じアルゴリズムを使用しますが、明示的な削減を単純な並列 for + アトミックの組み合わせに置き換えます。このコードは、「隠された」データ競合を示しているため、効率は低くなりますが、「手続き型」であり、「教育的」であると予想されます。特定のサンプルコードをアップロードすると、TBB、Cilk Plus、および OpenMP (OMP は C++ および Fortran 用) を使用した 4 つの同等の N-Queens 並列実装が実際に見つかります。

parallel-processing - openmp による N-queen の最適化

2 に答える 2

Related

Reference