Skip to content

Meeting 2024

Tommy Janjusic edited this page Apr 30, 2024 · 83 revisions

Open MPI Developer's 2024 Meeting

Meeting logistics:

Remote attendance information

The meeting rooms are integrated with MSTeams, there will be separate link for every day to attend the meeting for remote participants. This is a link to a non-public repo for the info (posting links publicly just invites spam; sorry folks).

If you do not have access to the non-public repo, please email Jeff Squyres.

Attendance

Please put your name down here if you plan to attend.

  1. Edgar Gabriel (AMD)
  2. Howard Pritchard (LANL)
  3. Thomas Naughton (ORNL)
  4. George Bosilca (NVidia)
  5. Joseph Schuchart (UTK)
  6. Kawthar Shafie Khorassani (AMD)
  7. Manu Shantharam (AMD)
  8. Luke Robison (AWS)
  9. Jun Tang (AWS)
  10. Wenduo Wang (AWS)
  11. Tommy Janjusic (Nvidia)

Agenda items

The meeting is tentatively scheduled to start on April 24 around 1pm, and is expected to finish on April 26 around lunch time.

Please add Agenda items we need to discuss here.

  • Support for MPI 4.0 compliance (https://github.com/open-mpi/ompi/projects/2)

  • Support for MPI 4.1 compliance (https://github.com/open-mpi/ompi/projects/4)

    • Memory kind info objects

    Edgar presents some slides summarizing this MPI 4.1 feature. Discussion of mpi_assert_memory_alloc_kinds. How would we actually use this with Open MPI? Slides also show work items. Some work required in prrte. Discuss lazy init of cuda, etc. We may not be able to do that much optimization based on memory kinds anyway out side of ptr check. Some discussion of the complications of restrictors and how that may complicate actually using this kind info for Open MPI internally. Also Open MPI can be configured with multiple device type support. Do we need to support different device types concurrently? Accelerator framework currently not set up to deal with this, it allows only one component to be active. Discuss multiple devices of a single type. Right now cuda and rocm not making use of APIs that use device IDs, but could, at least for cuda. Maybe items for a 6.0 release?

  • Support for MPI 4.2(?) ABI (https://github.com/mpi-forum/mpi-issues/issues/751)

    • Operative question: when is MPI v4.2 expected to be ratified?

    Consensus was that Forum was probably being optimistic to think v4.2 could be turned around in a year. Don't think a release by SC24 is realistic.

    • related PR (https://github.com/open-mpi/ompi/pull/12033)

      Jake reviewed this PR. George pointed out that the 'c' code now looks very similar to the 'c' code for f77 entry points. Howard explained current way we would support ABI version and 'native' Open MPI version. Should we just switch to ABI - only George suggested? Maybe a native version is more important to MPICH community? Or maybe we have the same ISV support issues too?

  • Collective Operations

    • xhc/shared memory collectives
    • GPU collectives
    • Collective configuration file
    • Memory allocation caching

    Could we combine/migrate some of adapt algorithms to libnbc? No not really, they use different approaches to non-blocking collectives. Coll framework has many components - can we possibly remove some of them? sm?_

    AWS focusing on optimizing collectives for EFA libfabric provider. Using MTL. Focus on HAN optimization - alltoall/alltoallv. Focus on Tuned/base - allreduce, allgather, reduce. Also working on a selection algorithm. Also considering a decision file based approach. PRs open for many of these.

    Quite a few PRs open right now for various collective algorithms: XHC, smdirect, acoll, coll/am. Should we start merging some of these in? Do we need all of these? For example, smdirect and acoll seem very similar in terms of functionality.

    Lots of discussion about selection and priority of components.

    Discuss whether to merge in https://github.com/open-mpi/ompi/pull/11418. George will ping PR author to see about level of commitment and if author or his org will support, go ahead with merge.

    Agree to merge in the acoll PR https://github.com/open-mpi/ompi/pull/12484 once it passes CI.

    Could use an easy to report way to determine what component/algorithm is being used for a collective op - maybe targeted for debugging case.

    Do we still need smdirect PR - https://github.com/open-mpi/ompi/pull/10470? (decided to leave open for now so it can be salvaged).

    Agree to remove coll/sm component.

  • Accelerator support

    • shared memory plans for 5.1 and beyond
    • one-sided operations

    IPC support in accelerators for 5.1. In main no components outside of accelerator framework make cuda calls. We do need IPC support in accelerator/cuda component.

    GMCA parameter support? PMIx may have something similar. Idea would be to change priorities for accelerator related components without having to set multiple MCA parameters.

    Joseph working on PR 12356 - https://github.com/open-mpi/ompi/pull/12356 and 12318 - https://github.com/open-mpi/ompi/pull/12318 related to accelerator support.

    Discuss implications of RDMABUF method of memory registration. Currently this is being used within some Libfabric providers and UCX. At this point it does not appear that we need to handle dmabuf registration within Open MPI itself. This might change if network providers require using dmabuf methods for memory registration.

    Did not discuss one-sided operations.

  • PRRTe future topic

    Jeff reviews where we are at with this. Idea is to switch over to a forked/modified slightly PRRTE for a 6.0.x branch/release. Ralph points out changing launcher (even if just switch to using a PRRTE fork) would be ill advised for the 5.0.x or even 5.1.x branch as packagers will be very unhappy. They only want to do packaging changes at major version number changes. So we would only do this if absolutely necessary. Discussion of debugger support. Ralph thinks this should be okay as these tools use(or should be using) the PMIx tools interface.

  • Review previously-created wiki pages for 5.1.x and 6.0.x in the context of planning for Open MPI vNEXT

    • These were made a long time ago; it would probably be good to re-evaluate, see which items are realistic, which will actually happen, etc. Timing / version numbers may change / consolidate, too, if we re-integrate PRRTE for v6.0.x (e.g., is doing a v5.1.x worth it at all?).
    • Proposed v6.0.x feature list

    Following PRRTE discussion it was decided that releasing a 5.1.x is unlikely and so discussed features targeting a 6.0.x release.

  • What to do about SLURM?

    The problem starts with SLURM release 23.11. SchedMD made changes both to the environment variables describing job id, etc. that impact the PRRTE RAS system's discovery mechanism. Ralph has been engaging SchedMD about ways to fix this recurring problem. Current plan is SchedMD is going to supply a supplementary library that PRRTE ras and plm components can use for allocation discovery and daemon launch.

  • For OFI group

    • Adopt libfabric 2.0 API?

    way too soon to worry about this.

    Discussed. At this point this is being handled internally in UCX and OFI providers of interest - certainly for CUDA, ROCM, and ZE devices now. May need to reconsider if long term path is to require OFI?/UCX consumers to manage rdmabuf registrations.

    • mtl/ofi vs. btl/ofi performance differences

    Edgar looking at the CXI provider and noticing the BTL/OFI pt2pt inter-node performance.

  • Misc

    • MPI_Info_set handling https://github.com/open-mpi/ompi/pull/11823
    • What is the bar for merging something into main ? Just a successful CI pass ? What if there are complains from the rest of the community ? What is the solution is known to be partial and incomplete ?
    • Should we enable better downstream build pipeline security for those downloading from open-mpi.org?
      • For v5.0.x, we have md5, sha1, and sha256 checksums in the HTML on the download page.
      • Should we have these values in (more easily) machine-readable formats somewhere?
      • Should we be to cryptographically signing releases somehow? (tarballs do not support signatures)
      • What do others do (e.g., GNU projects)?
  • Action items

    • Joseph will ping ScoreP folks about interest in MPI_T events. (DONE). Marc Andre says on the list for ScoreP. Tau using it and some unreleased version of MPI Advisor.
    • George will ping author of PR 11418 about level of commitment to determine whether to merge this PR into main.
    • Howard will work with Edgar on OFI BTL performance issues.
Clone this wiki locally