-
Notifications
You must be signed in to change notification settings - Fork 0
Description
I propose the following idea for a future MPI standard: persistent point-to-point collectives. The goal is to provide a flexible interface for pre-defining nearest-neighbor-like communication that allows an MPI implementation to pay most setup costs at request-creation time and to perform the communication pattern more efficiently.
Here is a straw-man API.
MPI_SEND_ADD(buf, count, datatype, dest, tag, request)
Adds a non-blocking send operation to a persistent request. Multiple operations can be added to the same request. This call would be local.
MPI_RECV_ADD(buf, count, datatype, source, tag, request)
Adds a non-blocking recv operation to a persistent request. Multiple operations can be added to the same request. This call would be local.
MPI_REQUEST_INIT(comm, request)
Makes a persistent point-to-point collective request available for use with MPI_START and MPI_WAIT. The resulting request would function like a persistent collective request. This call should come after all the ADD calls. It would be collective across the communicator.
A single persistent point-to-point collective request with MPI_START and MPI_WAIT would behave like the analogous array of persistent point-to-point requests with MPI_STARTALL and MPI_WAITALL, but with the following restrictions.
- The destinations, sources, and tags of the sends and receives would all be required to match globally at INIT time.
- The MPI_STATUS returned by MPI_WAIT would only support fields supported by other persistent collectives.
Then the following optimizations could all happen at INIT time.
- Global matchings of sends and receives.
- Registration of buffers for RDMA.
- Allocation of resources for efficient synchronization and data transfers.
This API could have the following advantages over existing non-blocking and persistent point-to-point communication.
- Better communication performance.
- The potential to check for deadlock or mismatched messages at INIT time.
This API could have the following advantages over persistent neighborhood collectives, while maintaining similar opportunity for performance.
- Simpler-to-understand construction of requests, particularly when refactoring existing point-to-point code. It would support building a request out of familiar sends and receives instead of topology constructors.
- The flexibility to use multiple buffers, instead of requiring single send and receive buffers.
- No need to create a new communicator, thus avoiding the potential consumption of limited resources that a separate communicator might require.
Please forgive me if the MPI Forum has already investigated similar ideas.