-
Notifications
You must be signed in to change notification settings - Fork 441
Open
Description
In OpenMPI, when using lnx
with shm
and FI_SHM_USE_XPMEM=1
a rank fails to send messages above certain size to itself.
To Reproduce
Compile the attached program. For messages of 512 and fewer double
values the program runs fine:
mpirun -np 2 -x FI_SHM_USE_XPMEM=1 -x FI_LNX_PROV_LINKS="shm" -mca pml cm -mca mtl ofi --mca opal_common_ofi_provider_include "lnx" -mca mtl_ofi_av table -prtemca ras_base_launch_orted_on_hn 1 ./mpi-test 512
OK
OK
But for larger buffers the communication fails:
mpirun -np 2 -x FI_SHM_USE_XPMEM=1 -x FI_LNX_PROV_LINKS="shm" -mca pml cm -mca mtl ofi --mca opal_common_ofi_provider_include "lnx" -mca mtl_ofi_av table -prtemca ras_base_launch_orted_on_hn 1 ./mpi-test 513
[x1001c1s1b0n3:00000] *** An error occurred in MPI_Waitall
[x1001c1s1b0n3:00000] *** reported by process [3355574273,1]
[x1001c1s1b0n3:00000] *** on communicator MPI_COMM_WORLD
[x1001c1s1b0n3:00000] *** MPI_ERR_INTERN: internal error
[x1001c1s1b0n3:00000] *** MPI_ERRORS_ARE_FATAL (processes in this communicator will now abort,
[x1001c1s1b0n3:00000] *** and MPI will try to terminate your MPI job as well)
The program runs when I turn off xpemem:
mpirun -np 2 -x FI_SHM_USE_XPMEM=0 -x FI_LNX_PROV_LINKS="shm" -mca pml cm -mca mtl ofi --mca opal_common_ofi_provider_include "lnx" -mca mtl_ofi_av table -prtemca ras_base_launch_orted_on_hn 1 ./mpi-test 513
OK
OK
Environment:
Running on Cray/HPE SLES, libfabric from main:b4d66113d3b49b927c529067beb7ab7cf6465564
, OpenMPI 5.0.7
Additional context
This is the simple reproducer:
#include <mpi.h>
#include <stdio.h>
#include <stdlib.h>
int main(int argc, char *argv[])
{
MPI_Request *req_buffer;
int rank, size, buff_size = atoi(argv[1]);
MPI_Init (&argc, &argv);
MPI_Comm_rank (MPI_COMM_WORLD, &rank);
MPI_Comm_size (MPI_COMM_WORLD, &size);
req_buffer = malloc(sizeof(MPI_Request)*size);
double *send_buffer = (double*)calloc(sizeof(double),buff_size);
double *recv_buffer = (double*)calloc(sizeof(double),buff_size);
MPI_Isend (send_buffer, buff_size, MPI_DOUBLE,
rank, 10, MPI_COMM_WORLD,
req_buffer+1);
MPI_Irecv (recv_buffer, buff_size, MPI_DOUBLE,
rank, 10, MPI_COMM_WORLD,
req_buffer+0);
MPI_Waitall (2, req_buffer+0, MPI_STATUSES_IGNORE);
printf("OK\n");
MPI_Finalize();
}
Compile with mpicc mpi-test.c -o mpi-test