-
Notifications
You must be signed in to change notification settings - Fork 930
Description
This is an extension of a discussion in #13323 - moving it to a separate issue as it is unrelated to the original question.
The issue that arose centered on the ability for multiple "mpirun" instances to execute MPI connect-accept operations between them. This was noted as something that the MPI Standard supports, and we had some limited ability to support in earlier OMPI releases. However, that capability may have eroded in more recent years (perhaps related to the switch from ORTE to PRRTE, though that isn't entirely clear).
Our current support for connect-accept between multiple mpirun instances has a base requirement that all those instances be operating within a single allocation. The operation goes something like this:
$ salloc -N 200
$ export PRTEPROXY_USE_DVM=1
$ mypid = prte --report-pid + &
$ mpirun --dvm pid:$mypid .... &
$ mpirun --dvm pid:$mypid .... &
$ mpirun --dvm pid:$mypid .... &
<wait for completion>
$ pterm --pid mypid (or terminate allocation)Originally, one had to use prun in place of mpirun, but @naughtont3 provided the extension to allow users to utilize the more familiar mpirun command (though it added the requirement for setting the PRTEPROXY_USE_DVM envar).
I haven't seen any complaints about multi-mpirun connect-accept operations, but that doesn't mean people in the community aren't hearing them. So the question becomes:
- Is there a true need for cross-allocation support for this operation? Note that it can, of course, be done - but we have never previously supported it. Have user needs evolved to a point where this is now required?
- Is the problem more a lack of education - i.e., users don't know the above procedure for enabling this operation? If so, then how does OMPI propose to educate them? Do you need a doc somewhere? Or is there some other preferred method?
Looking for guidance on requirements here. Note that there is a separate PRRTE request for cross-allocation DVM operations, but it has nothing to do with MPI. We have identified two methods for supporting that request - one would be relevant to an extension of this capability, the other probably would not. The "non-relevant" option is probably better for meeting that requirement, but if cross-allocation operations is something truly needed by the MPI community, that might influence the decision.