Idea: Persistent Point-to-Point Collectives

I propose the following idea for a future MPI standard: _persistent point-to-point collectives_. The goal is to provide a flexible interface for pre-defining nearest-neighbor-like communication that allows an MPI implementation to pay most setup costs at request-creation time and to perform the communication pattern more efficiently.

Here is a straw-man API.

**MPI_SEND_ADD(buf, count, datatype, dest, tag, request)**
Adds a non-blocking send operation to a persistent request. Multiple operations can be added to the same request. This call would be local.
 
**MPI_RECV_ADD(buf, count, datatype, source, tag, request)**
Adds a non-blocking recv operation to a persistent request. Multiple operations can be added to the same request. This call would be local.
 
**MPI_REQUEST_INIT(comm, request)**
Makes a persistent point-to-point collective request available for use with **MPI_START** and **MPI_WAIT**. The resulting request would function like a _persistent collective_ request. This call should come after all the **ADD** calls. It would be collective across the communicator.

A single persistent point-to-point _collective_ request with **MPI_START** and **MPI_WAIT** would behave like the analogous array of persistent point-to-point requests with **MPI_STARTALL** and **MPI_WAITALL**, but with the following restrictions.
- The destinations, sources, and tags of the sends and receives would all be required to match globally at **INIT** time.
- The **MPI_STATUS** returned by **MPI_WAIT** would only support fields supported by other persistent collectives.

Then the following optimizations could all happen at **INIT** time.
- Global matchings of sends and receives.
- Registration of buffers for RDMA.
- Allocation of resources for efficient synchronization and data transfers.

This API could have the following advantages over existing non-blocking and persistent point-to-point communication.
- Better communication performance.
- The potential to check for deadlock or mismatched messages at **INIT** time.

This API could have the following advantages over _persistent neighborhood collectives_, while maintaining similar opportunity for performance.
- Simpler-to-understand construction of requests, particularly when refactoring existing point-to-point code. It would support building a request out of familiar sends and receives instead of topology constructors. 
- The flexibility to use multiple buffers, instead of requiring single send and receive buffers.
- No need to create a new communicator, thus avoiding the potential consumption of limited resources that a separate communicator might require.

Please forgive me if the MPI Forum has already investigated similar ideas.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Idea: Persistent Point-to-Point Collectives #4

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Idea: Persistent Point-to-Point Collectives #4

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions