-
Notifications
You must be signed in to change notification settings - Fork 0
Description
Currently the MPI standard does not provide any ordering guarantees for puts and gets unless remote completion is performed in between two operations. For atomic operations, only accesses to same/overlapping memory regions are guaranteed to be ordered (if origin and target are the same).
As far as I can see, many (most?) high-performance networks provide the notion of a fence to inject an ordering request into a stream of operations, i.e., to ensure that one operation is completed before the next is performed without having to wait for remote completion at the origin.
Example:
a = 0;
PUT(1->win, 1);
FENCE();
GET(1->win, &a);
FLUSH();
assert(a == 1);
This may be more efficient than having two flushes because we only have one round-trip instead of two, which is esp. useful for latency-bound use-cases (single-value atomics/reads/writes) and to ensure memory consistency in frameworks built on top of MPI RMA.
Libraries such as OpenShmem and UCX both offer this functionality.
Hence my question: has this been discussed (and dismissed?) in the past? Is there a reason not to add something like MPI_Win_order that triggers a hardware fence if available? (since MPI_Win_fence is already taken) On networks that don't support fences in hardware, implementations may always fall-back to a full flush, which is what the user would do otherwise at the moment.