Skip to content

prov/lnx: error when sending to self with xpmem #11285

@angainor

Description

@angainor

In OpenMPI, when using lnx with shm and FI_SHM_USE_XPMEM=1 a rank fails to send messages above certain size to itself.

To Reproduce
Compile the attached program. For messages of 512 and fewer double values the program runs fine:

mpirun -np 2 -x FI_SHM_USE_XPMEM=1 -x FI_LNX_PROV_LINKS="shm" -mca pml cm -mca mtl ofi --mca opal_common_ofi_provider_include "lnx" -mca mtl_ofi_av table -prtemca ras_base_launch_orted_on_hn 1  ./mpi-test 512
OK
OK

But for larger buffers the communication fails:

mpirun -np 2 -x FI_SHM_USE_XPMEM=1 -x FI_LNX_PROV_LINKS="shm" -mca pml cm -mca mtl ofi --mca opal_common_ofi_provider_include "lnx" -mca mtl_ofi_av table -prtemca ras_base_launch_orted_on_hn 1  ./mpi-test 513
[x1001c1s1b0n3:00000] *** An error occurred in MPI_Waitall
[x1001c1s1b0n3:00000] *** reported by process [3355574273,1]
[x1001c1s1b0n3:00000] *** on communicator MPI_COMM_WORLD
[x1001c1s1b0n3:00000] *** MPI_ERR_INTERN: internal error
[x1001c1s1b0n3:00000] *** MPI_ERRORS_ARE_FATAL (processes in this communicator will now abort,
[x1001c1s1b0n3:00000] ***    and MPI will try to terminate your MPI job as well)

The program runs when I turn off xpemem:

mpirun -np 2 -x FI_SHM_USE_XPMEM=0 -x FI_LNX_PROV_LINKS="shm" -mca pml cm -mca mtl ofi --mca opal_common_ofi_provider_include "lnx" -mca mtl_ofi_av table -prtemca ras_base_launch_orted_on_hn 1  ./mpi-test 513
OK
OK

Environment:
Running on Cray/HPE SLES, libfabric from main:b4d66113d3b49b927c529067beb7ab7cf6465564, OpenMPI 5.0.7

Additional context
This is the simple reproducer:

#include <mpi.h>
#include <stdio.h>
#include <stdlib.h>

int main(int argc, char *argv[])
{
  MPI_Request *req_buffer;
  int rank, size, buff_size = atoi(argv[1]);

  MPI_Init (&argc, &argv);
  MPI_Comm_rank (MPI_COMM_WORLD, &rank);
  MPI_Comm_size (MPI_COMM_WORLD, &size);
  req_buffer = malloc(sizeof(MPI_Request)*size);
  
  double *send_buffer = (double*)calloc(sizeof(double),buff_size);
  double *recv_buffer = (double*)calloc(sizeof(double),buff_size);
  
  MPI_Isend (send_buffer, buff_size, MPI_DOUBLE,
	     rank, 10, MPI_COMM_WORLD,
	     req_buffer+1);
  MPI_Irecv (recv_buffer, buff_size, MPI_DOUBLE,
	     rank, 10, MPI_COMM_WORLD,
	     req_buffer+0);   
  MPI_Waitall (2, req_buffer+0, MPI_STATUSES_IGNORE);
  printf("OK\n");
  MPI_Finalize();
}

Compile with mpicc mpi-test.c -o mpi-test

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions