c - MPI でこの問題を高速化する方法

Question

(1)。MPI を使用して、以下のコードのループで時間のかかる計算を高速化するにはどうすればよいでしょうか?

 int main(int argc, char ** argv)   
 {   
 // some operations           
 f(size);           
 // some operations         
 return 0;   
 }   

 void f(int size)   
 {   
 // some operations          
 int i;           
 double * array =  new double [size];           
 for (i = 0; i < size; i++) // how can I use MPI to speed up this loop to compute all elements in the array?   
 {   
 array[i] = complicated_computation(); // time comsuming computation   
 }           
 // some operations using all elements in array           
 delete [] array;  
 }

コードのように、MPIで並列する部分の前後で何らかの操作をしたいのですが、並列部分の開始位置と終了位置を指定する方法がわかりません。

(2) 私の現在のコードは、通信を高速化するために OpenMP を使用しています。

 void f(int size)   
 {   
 // some operations           
 int i;           
 double * array =  new double [size];   
 omp_set_num_threads(_nb_threads);  
 #pragma omp parallel shared(array) private(i)  
 {
 #pragma omp for schedule(dynamic) nowait          
 for (i = 0; i < size; i++) // how can I use MPI to speed up this loop to compute all elements in the array?   
 {   
 array[i] = complicated_computation(); // time comsuming computation   
 }          
 } 
 // some operations using all elements in array           
 }

MPI を使用するように変更したいのですが、OpenMP と MPI の両方でコードを作成することは可能ですか? 可能であれば、コードの書き方と、コードをコンパイルして実行する方法を教えてください。

(3) 私たちのクラスターには、mvapich-1.0.1、mvapich2-1.0.3、openmpi-1.2.6 の 3 つのバージョンの MPI があります。使い方は同じですか？特に私の場合。どちらを使用するのが最適ですか?

よろしくお願いします！

アップデート：

並列部分の開始と終了を指定する方法に関する私の質問について、もう少し説明したいと思います。次のおもちゃのコードでは、関数 f() 内の並列部分を制限したいと考えています。

#include "mpi.h"  
#include <stdio.h>  
#include <string.h>  

void f();

int main(int argc, char **argv)  
{  
printf("%s\n", "Start running!");  
f();  
printf("%s\n", "End running!");  
return 0;  
}  


void f()  
{  
char idstr[32]; char buff[128];  
int numprocs; int myid; int i;  
MPI_Status stat;  

printf("Entering function f().\n");

MPI_Init(NULL, NULL);  
MPI_Comm_size(MPI_COMM_WORLD,&numprocs);  
MPI_Comm_rank(MPI_COMM_WORLD,&myid);  

if(myid == 0)  
{  
  printf("WE have %d processors\n", numprocs);  
  for(i=1;i<numprocs;i++)  
  {  
    sprintf(buff, "Hello %d", i);  
    MPI_Send(buff, 128, MPI_CHAR, i, 0, MPI_COMM_WORLD); }  
    for(i=1;i<numprocs;i++)  
    {  
      MPI_Recv(buff, 128, MPI_CHAR, i, 0, MPI_COMM_WORLD, &stat);  
      printf("%s\n", buff);  
    }  
}  
else  
{  
  MPI_Recv(buff, 128, MPI_CHAR, 0, 0, MPI_COMM_WORLD, &stat);  
  sprintf(idstr, " Processor %d ", myid);  
  strcat(buff, idstr);  
  strcat(buff, "reporting for duty\n");  
  MPI_Send(buff, 128, MPI_CHAR, 0, 0, MPI_COMM_WORLD);  
}  
MPI_Finalize();  

printf("Leaving function f().\n");  
}

ただし、実行中の出力は想定されていません。並列部分の前後の printf 部分は、メインプロセスだけでなく、すべてのプロセスによって実行されています。

$ mpirun -np 3 ex2  
Start running!  
Entering function f().  
Start running!  
Entering function f().  
Start running!  
Entering function f().  
WE have 3 processors  
Hello 1 Processor 1 reporting for duty  

Hello 2 Processor 2 reporting for duty  

Leaving function f().  
End running!  
Leaving function f().  
End running!  
Leaving function f().  
End running!

したがって、並列部分は MPI_Init() と MPI_Finalize() の間で制限されていないように思えます。

これ以外にも、誰かが私の他の質問に答えてくれることを願っています。ありがとう！

score 8 · Accepted Answer

簡単な編集 (コメントを残す方法がわからないか、まだコメントを残すことが許可されていないため) -- 3lectrologos は、MPI プログラムの並列部分について正しくありません。MPI_Init の前と MPI_Finalize の後にシリアル作業を行うことはできず、実際にシリアルであることを期待できません。すべての MPI スレッドによって実行されます。

問題の一部は、MPI プログラムの「並列部分」がプログラム全体であることにあると思います。MPI は、指定した各ノードでほぼ同時に同じプログラム (メイン関数) の実行を開始します。MPI_Init 呼び出しは、MPI 呼び出しを正しく使用できるように、プログラムに対して特定の設定を行うだけです。

あなたがやりたいと思うことの正しい「テンプレート」（疑似コード）は次のようになります。

int main(int argc, char *argv[]) {
    MPI_Init(&argc, &argv);  
    MPI_Comm_size(MPI_COMM_WORLD,&numprocs);  
    MPI_Comm_rank(MPI_COMM_WORLD,&myid);

    if (myid == 0) { // Do the serial part on a single MPI thread
        printf("Performing serial computation on cpu %d\n", myid);
        PreParallelWork();
    }

    ParallelWork();  // Every MPI thread will run the parallel work

    if (myid == 0) { // Do the final serial part on a single MPI thread
        printf("Performing the final serial computation on cpu %d\n", myid);
        PostParallelWork();
    }

    MPI_Finalize();  
    return 0;  
}

score 2 · Accepted Answer

MPI_Init (&argc および &argv の引数を使用。MPI 実装の要件) は、実際には MAIN の最初に実行されるステートメントでなければなりません。Finalize は、最後に実行されたステートメントでなければなりません。

main() は、MPI 環境のすべてのノードで開始されます。ノード数、node_id、マスターノードアドレスなどのパラメータは、argc と argv を介して渡すことができます。

それはフレームワークです：

#include "mpi.h"  
#include <stdio.h>  
#include <string.h>  

void f();

int numprocs; int myid; 

int main(int argc, char **argv)  
{  

MPI_Init(&argc, &argv);  
MPI_Comm_size(MPI_COMM_WORLD,&numprocs);  
MPI_Comm_rank(MPI_COMM_WORLD,&myid);  

if(myid == 0)  
{  /* main process. user interaction is ONLY HERE */

    printf("%s\n", "Start running!");  

    MPI_Send ... requests with job
    /*may be call f in main too*/
    MPU_Reqv ... results..
    printf("%s\n", "End running!");  
}
else
{

  /* Slaves. Do sit here and wait a job from main process */
  MPI_Recv(.input..);  
  /* dispatch input by parsing it 
    (if there can be different types of work)
    or just do the work */    
  f(..)
  MPI_Send(.results..);  
}

MPI_Finalize();  

return 0;  
}

score 1 · Accepted Answer

OpenMP 形式のクラスターへの最も簡単な移行は、Intel の「Cluster OpenMP」です。

MPI の場合、作業のディスパッチを完全に書き直す必要があります。

score 1 · Accepted Answer

配列内のすべての値が独立している場合、簡単に並列化できるはずです。配列をほぼ同じサイズのチャンクに分割し、各チャンクをノードに与えてから、結果をコンパイルして戻します。

c - MPI でこの問題を高速化する方法

4 に答える 4

Related

Reference