pmi: session re-init and multiple PMI initialization #6567

hzhou · 2023-06-22T15:04:35Z

hzhou
Jun 22, 2023
Maintainer

The session re-init introduces the situations of multiple MPI_Init/MPI_Finalize windows. We conventionally call PMI_Init in MPI_Init and call PMI_Finalize in MPI_Finalize. However, PMI_Finalize will close the PMI connection to the server. For many implementations, hydra included, this PMI connection is established at the time of launching, and once closed, it's difficult to re-establish. PR #6534 moved the PMI_Finalize to atexit handler, so we will only close the connection at the end. However, we still call multiple PMI_Init and some implementations cannot handle mismatched PMI_Init/PMI_Finalize. PR #6564 fixes this for PMIx. But this issue may be common to PMIv1 and PMIv2 as well. There is report that Slurm's PMI2 implementation has the same issue. The fix may be simple for a particular implementation, but it requires discussion in general --

There are mixed components in PMI_Init/PMI_Finalize
1. PMI connection and launch-related setup that only can be finalized once
2. Soft initialization that in principle can be re-initialized and finalized cleanly
The first component should only be initialized and finalized once. We can use a global counter to make sure to init once and move the finalize to atexit handler.
The second component should be re-initialized/finalized each time. This allows support or extensions for malleability, such as allowing global resources to shrink and expand.

Thus, I think the right solution is to extend PMI API to differentiate the two different init/finalize functionality.

Alternatively, the implementation can hide this and internally handle them. This will require applications to always call PMI_Init/PMI_Finalize each time, and server internally should make sure launch-time setup not get destroyed. Most of these setup will get reaped at process exit time anyway and the server just need make sure not to raise false alarms. Ref 6a103ee

Unfortunately, it will take time for all implementations to update their fix. In the meantime, we have to deal with it on a case-by-case basis.

Reference:

hzhou · 2023-06-22T15:05:48Z

hzhou
Jun 22, 2023
Maintainer Author

tag @sonjahapp

0 replies

sonjahapp · 2023-07-03T13:40:51Z

sonjahapp
Jul 3, 2023

Thanks @hzhou for this summary.

I think we need to understand if it is of advantage for process managers and the MPI library in general (independent of the PM interface), if multiple inits and finalizes are done for different MPI Sessions. I think the questions are:

a) Is it of advantage for a PM if init/finalize is done per MPI Session?
b) Can the PM provide an advantage/ feature to the MPI library if init/finalize is done per MPI Session?

In my opinion, the answer to both questions is currently: No, not really; because there is no representation of something like an MPI Session in a PM to my knowledge (well... PMIx at least counts the inits but that's it).

For the moment, I would vote for initializing and finalizing the connection to the PM only once at the beginning and the end - independent of the PM. From there, we could move forward with multiple init/finalize in the future, as PMs, their interfaces, and their capability to represent/ manage individual MPI Sessions evolve.

4 replies

hzhou Jul 3, 2023
Maintainer Author

Without PMI_Init for each session, we are relying on long running thread (or at least callbacks and its corresponding data structures) outside MPI to maintain and monitor the PMI environment. I think that is a viable option. The alternate option is for session_init to (re-)discover the environment. The benefit of the latter is to have a clean MPI resource release after MPI_Finalize. The (re-)discovery is essentially a PMI_Init.

sonjahapp Jul 3, 2023

In your alternative option, we would need to make sure that the re-discovery is limited to session re-init cases. Otherwise, re-discovery might change the global rank, size, nodemap etc of the process also for other parallel existing sessions. This could lead to errors and/or unexpected behavior.

What resources would be released in a clean(er) way on MPI_(Session)_Finalize with the alternative option?

hzhou Jul 3, 2023
Maintainer Author

Otherwise, re-discovery might change the global rank, size, nodemap etc of the process also for other parallel existing sessions.

A proper session implementation is supposed to have isolations between sessions, even for concurrent sessions, right?

What resources would be released in a clean(er) way on MPI_(Session)_Finalize with the alternative option?

Something like the global pset array, the PMIx background thread, the callbacks.

sonjahapp Jul 3, 2023

A proper session implementation is supposed to have isolations between sessions, even for concurrent sessions, right?

Right. But I wonder how far such isolations should go, especially if the process' relation to the PM is concerned.

Something like the global pset array, the PMIx background thread, the callbacks.

The global pset array and event callbacks get released/ de-registered with a finalize callback (see here) when the last session is finalized and get re-initialized when a new session is started (re-init).

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

pmi: session re-init and multiple PMI initialization #6567

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Replies: 2 comments 4 replies

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

pmi: session re-init and multiple PMI initialization #6567

Uh oh!

Uh oh!

hzhou Jun 22, 2023 Maintainer

Replies: 2 comments · 4 replies

Uh oh!

hzhou Jun 22, 2023 Maintainer Author

Uh oh!

sonjahapp Jul 3, 2023

Uh oh!

hzhou Jul 3, 2023 Maintainer Author

Uh oh!

sonjahapp Jul 3, 2023

Uh oh!

hzhou Jul 3, 2023 Maintainer Author

Uh oh!

sonjahapp Jul 3, 2023

hzhou
Jun 22, 2023
Maintainer

Replies: 2 comments 4 replies

hzhou
Jun 22, 2023
Maintainer Author

sonjahapp
Jul 3, 2023

hzhou Jul 3, 2023
Maintainer Author

hzhou Jul 3, 2023
Maintainer Author