Skip to content

prov/efa/shm: CMA error 14 when MPI ranks > 1000 #11329

@yuanjianz

Description

@yuanjianz

Describe the bug
I am running some HPC applications on AWS parallelcluster using hpc7a instances (192 cores per node). My applications crash when total MPI ranks > 1000 when MPI_Bcast. Turning FI_LOG_LEVEL=warning, the following messages are shown:

libfabric:54609:1754681986::core:core:cma_copy():58<warn> CMA error 14
libfabric:54609:1754681986::shm:ep_ctrl:smr_start_common():790<warn> error processing op
[hpc7a-2b-dy-hpc7a-192-1:54137] *** An error occurred in MPI_Bcast
[hpc7a-2b-dy-hpc7a-192-1:54137] *** reported by process [2390687745,0]
[hpc7a-2b-dy-hpc7a-192-1:54137] *** on communicator MPI COMMUNICATOR 25 GROUP FROM 21
[hpc7a-2b-dy-hpc7a-192-1:54137] *** MPI_ERR_INTERN: internal error
[hpc7a-2b-dy-hpc7a-192-1:54137] *** MPI_ERRORS_ARE_FATAL (processes in this communicator will now abort,
[hpc7a-2b-dy-hpc7a-192-1:54137] ***    and potentially your MPI job)
[hpc7a-2b-dy-hpc7a-192-1:54104] 1 more process has sent help message help-mpi-errors.txt / mpi_errors_are_fatal
[hpc7a-2b-dy-hpc7a-192-1:54104] Set MCA parameter "orte_base_help_aggregate" to 0 to see all help / error messages

By disabling shm in mtl/ofi,

export FI_EFA_ENABLE_SHM_TRANSFER=0
export OMPI_MCA_mtl_ofi_provider_exclude=shm

the application run successfully with the following warnings:

libfabric:57215:1754682263::efa:ep_ctrl:efa_rdm_ep_post_handshake():610<warn> PKE entries exhausted.
libfabric:57215:1754682263::efa:ep_ctrl:efa_rdm_ep_post_handshake():610<warn> PKE entries exhausted.
...

To Reproduce
I guess MPI_Bcast at large scale with efa/shm could possibly trigger this error? I am not sure if this is application-specific.

Environment:
Amazon Linux 2023.6.20250317, hpc7a ec2 instance, openmpi 4.1.7, libfabric-aws/1.22.0amzn5.0

Additional context
I also make sure ptrace is set correctly by inserting

if [[ $( cat /proc/sys/kernel/yama/ptrace_scope ) == "0" ]]; then
        echo "PTrace Correct for EFA"
else
        echo "PTrace Override"
        sysctl -w kernel.yama.ptrace_scope=0
fi

as a part of my execution script and run mpirun ./execute.sh.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions