Expose hardware fence capabilities

Currently the MPI standard does not provide any ordering guarantees for puts and gets unless remote completion is performed in between two operations. For atomic operations, only accesses to same/overlapping memory regions are guaranteed to be ordered (if origin and target are the same).

As far as I can see, many (most?) high-performance networks provide the notion of a `fence` to inject an ordering request into a stream of operations, i.e., to ensure that one operation is completed before the next is performed without having to wait for remote completion at the origin. 

Example:
```
a = 0;
PUT(1->win, 1);
FENCE();
GET(1->win, &a);
FLUSH();
assert(a == 1);
```

This may be more efficient than having two flushes because we only have one round-trip instead of two, which is esp. useful for latency-bound use-cases (single-value atomics/reads/writes) and to ensure memory consistency in frameworks built on top of MPI RMA.

Libraries such as OpenShmem and UCX both offer this functionality. 

Hence my question: has this been discussed (and dismissed?) in the past? Is there a reason not to add something like `MPI_Win_order` that triggers a hardware fence if available? (since `MPI_Win_fence` is already taken) On networks that don't support fences in hardware, implementations may always fall-back to a full flush, which is what the user would do otherwise at the moment.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Expose hardware fence capabilities #10

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Expose hardware fence capabilities #10

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions