Skip to content

osu_igather Other MPI error starting at 8 nodes 96 ppn #7609

@longfei-austin

Description

@longfei-austin

Errors of the following signature have been encountered when running osu_igatherv and osu_igather:

Fatal error in internal_Wait: Other MPI error
Fatal error in internal_Barrier: Other MPI error
x4003c4s1b0n0.hsn.cm.aurora.alcf.anl.gov: rank 45283 died from signal 6
x4101c5s6b0n0.hsn.cm.aurora.alcf.anl.gov: rank 256 died from signal 11
x4101c5s6b0n0.hsn.cm.aurora.alcf.anl.gov: rank 257 died from signal 15

They are more commonly encountered with next-eval, start to happen at 8 nodes.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions