私は Boost MPI の使用に比較的慣れていません。ライブラリをインストールし、コードをコンパイルしましたが、非常に奇妙なエラーが発生しました - スレーブ ノードによって受信された一部の整数データが、マスターによって送信されたものではありません。何が起こっている?
私はブースト バージョン 1.42.0 を使用しており、mpic++ を使用してコードをコンパイルしています (これは、1 つのクラスターで g++ をラップし、もう 1 つのクラスターで icpc をラップします)。出力を含む簡略化された例を次に示します。
コード:
#include <iostream>
#include <boost/mpi.hpp>
using namespace std;
namespace mpi = boost::mpi;
class Solution
{
public:
Solution() :
solution_num(num_solutions++)
{
// Master node's constructor
}
Solution(int solutionNum) :
solution_num(solutionNum)
{
// Slave nodes' constructor.
}
int solutionNum() const
{
return solution_num;
}
private:
static int num_solutions;
int solution_num;
};
int Solution::num_solutions = 0;
int main(int argc, char* argv[])
{
// Initialization of MPI
mpi::environment env(argc, argv);
mpi::communicator world;
if (world.rank() == 0)
{
// Create solutions
int numSolutions = world.size() - 1; // One solution per slave
vector<Solution*> solutions(numSolutions);
for (int sol = 0; sol < numSolutions; ++sol)
{
solutions[sol] = new Solution;
}
// Send solutions
for (int sol = 0; sol < numSolutions; ++sol)
{
world.isend(sol + 1, 0, false); // Tells the slave to expect work
cout << "Sending solution no. " << solutions[sol]->solutionNum() << " to node " << sol + 1 << endl;
world.isend(sol + 1, 1, solutions[sol]->solutionNum());
}
// Retrieve values (solution numbers squared)
vector<double> values(numSolutions, 0);
for (int i = 0; i < numSolutions; ++i)
{
// Get values for each solution
double value = 0;
mpi::status status = world.recv(mpi::any_source, 2, value);
int source = status.source();
int sol = source - 1;
values[sol] = value;
}
for (int i = 1; i <= numSolutions; ++i)
{
world.isend(i, 0, true); // Tells the slave to finish
}
// Output the solutions numbers and their squares
for (int i = 0; i < numSolutions; ++i)
{
cout << solutions[i]->solutionNum() << ", " << values[i] << endl;
delete solutions[i];
}
}
else
{
// Slave nodes merely square the solution number
bool finished;
mpi::status status = world.recv(0, 0, finished);
while (!finished)
{
int solNum;
world.recv(0, 1, solNum);
cout << "Node " << world.rank() << " receiving solution no. " << solNum << endl;
Solution solution(solNum);
double value = static_cast<double>(solNum * solNum);
world.send(0, 2, value);
status = world.recv(0, 0, finished);
}
cout << "Node " << world.rank() << " finished." << endl;
}
return EXIT_SUCCESS;
}
これを 21 ノード (1 マスター、20 スレーブ) で実行すると、次のようになります。
Sending solution no. 0 to node 1
Sending solution no. 1 to node 2
Sending solution no. 2 to node 3
Sending solution no. 3 to node 4
Sending solution no. 4 to node 5
Sending solution no. 5 to node 6
Sending solution no. 6 to node 7
Sending solution no. 7 to node 8
Sending solution no. 8 to node 9
Sending solution no. 9 to node 10
Sending solution no. 10 to node 11
Sending solution no. 11 to node 12
Sending solution no. 12 to node 13
Sending solution no. 13 to node 14
Sending solution no. 14 to node 15
Sending solution no. 15 to node 16
Sending solution no. 16 to node 17
Sending solution no. 17 to node 18
Sending solution no. 18 to node 19
Sending solution no. 19 to node 20
Node 1 receiving solution no. 0
Node 2 receiving solution no. 1
Node 12 receiving solution no. 19
Node 3 receiving solution no. 19
Node 15 receiving solution no. 19
Node 13 receiving solution no. 19
Node 4 receiving solution no. 19
Node 9 receiving solution no. 19
Node 10 receiving solution no. 19
Node 14 receiving solution no. 19
Node 6 receiving solution no. 19
Node 5 receiving solution no. 19
Node 11 receiving solution no. 19
Node 8 receiving solution no. 19
Node 16 receiving solution no. 19
Node 19 receiving solution no. 19
Node 20 receiving solution no. 19
Node 1 finished.
Node 2 finished.
Node 7 receiving solution no. 19
0, 0
1, 1
2, 361
3, 361
4, 361
5, 361
6, 361
7, 361
8, 361
9, 361
10, 361
11, 361
12, 361
13, 361
14, 361
15, 361
16, 361
17, 361
18, 361
19, 361
Node 6 finished.
Node 3 finished.
Node 17 receiving solution no. 19
Node 17 finished.
Node 10 finished.
Node 12 finished.
Node 8 finished.
Node 4 finished.
Node 15 finished.
Node 18 receiving solution no. 19
Node 18 finished.
Node 11 finished.
Node 13 finished.
Node 20 finished.
Node 16 finished.
Node 9 finished.
Node 19 finished.
Node 7 finished.
Node 5 finished.
Node 14 finished.
そのため、マスターがノード 1 に 0、ノード 2 に 1、ノード 3 に 2 などを送信している間、ほとんどのスレーブ ノードは (何らかの理由で) 数値 19 を受信します。 0 の 2 乗、1 の 2 乗、19 の 2 乗を 18 回取得します。
これを説明できる人に事前に感謝します。
アラン