Skip to content

Conversation

@jeffhammond
Copy link
Member

@jeffhammond jeffhammond commented Oct 3, 2024

This was a long-standing omission in the implementation. ARMCI nonblocking handles are similar to MPI RMA requests but are not 1:1 because aggregate request handles are 1:N.

This implements request handles using RMA requests, which replaces the prior implementation that just did flush(_all) instead of individual handle completion. The old implementation is preserved via the preprocessor.

This also adds a feature to switch to Rget_accumulate for atomics (all of which are blocking), which avoids a flush in this code path that might be slowed down by the need to complete more expensive, potentially non-hardware, operations.

This has not been tested thoroughly. It will be merged after sufficient testing.

Tested with:

  • MPICH 4.2 Ch4 OFI in shared memory
  • MPICH 4.2 Ch3 in shared memory.
  • Open MPI 4.x in shared memory
  • Cray MPI on LUMI
  • HPC-X (Open MPI 4) on Mellanox IB
  • Open MPI 5 on Mellanox IB
  • MVAPICH on Mellanox IB
  • MPICH UCX on Mellanox IB
  • MPICH OFI on Mellanox IB

@jeffhammond jeffhammond self-assigned this Oct 3, 2024
@jeffhammond jeffhammond marked this pull request as ready for review February 28, 2025 08:41
Fetch_and_op or Compare_and_swap plus Flush(_local) might be more expensive
so we add an option to use Rget_accumulate (yes, way more arguments)
and wait on the resulting request, which might be better in some cases.
Signed-off-by: Jeff Hammond <[email protected]>
no implementation of request-based RMA yet...

Signed-off-by: Jeff Hammond <[email protected]>
Signed-off-by: Jeff Hammond <[email protected]>
this is not working for nonblocking vector ops, which fails in armci-test.
all other tests pass, at least in shared-memory.

Signed-off-by: Jeff Hammond <[email protected]>
Signed-off-by: Jeff Hammond <[email protected]>
Signed-off-by: Jeff Hammond <[email protected]>
Signed-off-by: Jeff Hammond <[email protected]>
Signed-off-by: Jeff Hammond <[email protected]>
Signed-off-by: Jeff Hammond <[email protected]>
running NWChem generates a huge number of assertions/warnings about bogus handles.
it would seem that GA does a bad job of initializing these.

Signed-off-by: Jeff Hammond <[email protected]>
ARMCII_Warning was called before ARMCI_GROUP_WORLD was initialized, so warnings in init were printed by every rank.

Signed-off-by: Jeff Hammond <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant