Skip to content

osu_ialltoall Other MPI error starting at 32 nodes 96 ppn #7608

@longfei-austin

Description

@longfei-austin

This happened on current image at 512 nodes.

This happened on next-eval at 32, 128, 512 nodes. The 32 node output is in the following path

/lus/flare/projects/Aurora_testing/mpi/osu_rfm/run_collective/32/ialltoall-ialltoallv-ialltoallw/stage/2025-09-14_11-09-45/aurora/compute/PrgEnv-intel/RunMPIcollective

cat rfm_job.err | grep "error"
Abort(15) on node 1632 (rank 1632 in comm 0): Fatal error in internal_Wait: Other MPI error
Rank 1632 aborted with code 15: Fatal error in internal_Wait: Other MPI error

The error is encountered with a call of mpiexec on ialltoall

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions