MPI 非ブロック送信を行うと、セグメンテーション違反でマシンがクラッシュするという問題が発生します。すべてのマシンがデータを正しく受信しますが、ID 0 のマシンはMPI_Waitall()
操作中にクラッシュします。誰でも問題の原因を特定できますか? ありがとうございました!
プログラムのソース コードと、実行時に表示されるエラー レポートは次のとおりです。
#include <stdio.h>
#include <stdlib.h>
#include <mpi.h>
#define BLOCK_LOW(id,p,n) ((id)*(n)/(p))
#define BLOCK_HIGH(id,p,n) (BLOCK_LOW((id)+1,p,n)-1)
#define BLOCK_SIZE(id,p,n) (BLOCK_HIGH(id,p,n)-BLOCK_LOW(id,p,n)+1)
#define BLOCK_OWNER(id,p,n) (((p)*((id)+1)-1)/(n))
#define LENGTH 100
int main(int argc, char *argv[]) {
int id, p, i;
MPI_Request* sendRequests;
MPI_Status* sendStatuses;
MPI_Request receiveRequest;
MPI_Status receiveStatus;
int array[LENGTH];
int array2[LENGTH];
MPI_Init(&argc, &argv);
MPI_Barrier(MPI_COMM_WORLD);
for (i = 0; i < LENGTH; i++) {
array[i] = i * 5;
array2[i] = 0;
}
MPI_Comm_rank(MPI_COMM_WORLD, &id);
MPI_Comm_size(MPI_COMM_WORLD, &p);
if (id == 0) {
sendRequests = malloc((p-1) * sizeof(MPI_Request));
for (i = 1; i < p; i++) {
MPI_Isend(array + BLOCK_LOW(i-1, p-1, LENGTH), BLOCK_SIZE(i-1, p-1, LENGTH), MPI_INT, i, 0, MPI_COMM_WORLD, &sendRequests[i-1]);
}
MPI_Waitall(p-1, sendRequests, sendStatuses);
} else {
MPI_Recv(array2, BLOCK_SIZE(id-1, p-1, LENGTH), MPI_INT, 0, 0, MPI_COMM_WORLD, &receiveStatus);
for (i = 0; i < BLOCK_SIZE(id-1, p-1, LENGTH); i++) {
printf("Element %d (%d): %d\n", i, i + BLOCK_LOW(id-1, p-1, LENGTH), array2[i]);
}
}
MPI_Barrier(MPI_COMM_WORLD);
MPI_Finalize();
return 0;
}
これは、コードを実行したときに表示されるエラーです。
[lin12p5:13467] *** Process received signal ***
[lin12p5:13467] Signal: Segmentation fault (11)
[lin12p5:13467] Signal code: Invalid permissions (2)
[lin12p5:13467] Failing at address: 0x400f30
[lin12p5:13467] [ 0] /lib/libpthread.so.0(+0xeff0) [0x7fa96ab4eff0]
[lin12p5:13467] [ 1] /usr/lib/libmpi.so.0(+0x37f01) [0x7fa96bad5f01]
[lin12p5:13467] [ 2] /usr/lib/libmpi.so.0(PMPI_Waitall+0xb3) [0x7fa96bb06b73]
[lin12p5:13467] [ 3] mpi-test(main+0x232) [0x400da6]
[lin12p5:13467] [ 4] /lib/libc.so.6(__libc_start_main+0xfd) [0x7fa96a7fcc8d]
[lin12p5:13467] [ 5] mpi-test() [0x400ab9]
[lin12p5:13467] *** End of error message ***
--------------------------------------------------------------------------
mpirun noticed that process rank 0 with PID 13467 on node lab12p5 exited on signal 11 (Segmentation fault).
--------------------------------------------------------------------------
[lin13p5][[33088,1],1][../../../../../../ompi/mca/btl/tcp/btl_tcp_frag.c:216:mca_btl_tcp_frag_recv] mca_btl_tcp_frag_recv: readv failed: Connection reset by peer (104)