WeeklyTelcon_20210125

Open MPI Weekly Telecon ---

Dialup Info: (Do not post to public mailing list or public wiki)

Attendees (on Web-ex)

Aurelien Bouteiller (UTK)
Brendan Cunningham (Cornelis Networks)
Brian Barrett (AWS)
Christoph Niethammer (HLRS)
David Bernhold (ORNL)
Edgar Gabriel (UH)
Geoffrey Paulsen (IBM)
George Bosilca (UTK)
Harumi Kuno (HPE)
Hessam Mirsadeghi (UCX/nVidia)
Howard Pritchard (LANL)
Jeff Squyres (Cisco)
Joseph Schuchart
Josh Hursey (IBM)
Joshua Ladd (nVidia/Mellanox)
Matthew Dosanjh (Sandia)
Michael Heinz (Cornelis Networks)
Raghu Raja (AWS)
Ralph Castain (Intel)
Todd Kordenbrock (Sandia)
William Zhang (AWS)

not there today (I keep this for easy cut-n-paste for future notes)

Akshay Venkatesh (NVIDIA)
Austen Lauria (IBM)
Naughton III, Thomas (ORNL)
Artem Polyakov (nVidia/Mellanox)
Brandon Yates (Intel)
Charles Shereda (LLNL)
Erik Zeiske
Geoffroy Vallee (ARM)
Mark Allen (IBM)
Matias Cabral (Intel)
Nathan Hjelm (Google)
Noah Evans (Sandia)
Scott Breyer (Sandia?)
Shintaro iwasaki
Tomislav Janjusic
Xin Zhao (nVidia/Mellanox)

4.0.x

v4.0 release, would like to take this ROMIO one-off fix instead of
- https://github.com/open-mpi/ompi/pull/8370 - Fixes HDF5 on LUSTRE
- Proposing take this one-off for v4.0.6, as a whole new ROMIO is a big change.
- Waiting on v4.0.6rc2 until we get an answer.
- Everyone seems okay with taking this into release branch, and waiting for ROMIO update on master.
- Just needs a review

v4.1

Issue 8334 - a performance regression with AVX512 on Skylake. Still digging into.
- Blocker for v4.1 release. Performance regression.
  - Unclear if the scope is isolated to LAAMPS or all Allreduce.
- VASP - with lots of allreduce didn't see much perf difference AVX on/off
- Weird it wouldn't help, but AVX perf only helps large vector reductions.
- Horivod saw about 20% improvement in original UTK paper @ EURO-MPI
  - https://www.icl.utk.edu/publications/using-advanced-vector-extensions-avx-512-mpi-reduction
Issue 8410 - Build Failure on Apple Silicon.
- Do we just need new updated string, or is that just one of the issues.
- Code changes we need in v4.1.
- Will have exact same problem in PMIx and PRRTE
- Performance with Atomic FIFO is another issue, might not need to backport to v4.1
Issue 8367 - will take to UCX community
- Not yet brought up to UCX community. Josh will take up
Issue 8379 - UCT appears to be default and not UCX
- Jeff repinged for request

Open-MPI v5.0

What's the state of ULFM (PR 7740) for v5.0?

Does the community want this ULFM PR 7740 for OMPI v5.0? If so, we need a PRRTE v3.0
- Howard gave it a spin, and worked with a few issues.
  - -no-orte CI tests will fail
    - This PR detects if you're using an external PRRTE, and if RT is not FT, then it errors out in the configure.
    - -no-orte is just an alias to -no-prrte, so if this is causing issues, may
    - We're pushing to externl PRRTE.
    - This build should only abort if it's requested, but not found.
    - Aurlien will fix, which will make CI fix.
    - compiler fix is just duplicate typedefs.
- Aurelien will make a PR today to add some tests, but unsure how to add to MTT.
  - Put into ibm suite, most will pickup by default.
Are we Feature Complete?
- PRRTE should be ready end of Q1.
- Based on v5.0 tracker, there is a bunch of stuff not in.
- GPU Direct support for OFI MTL
  - AWS working on now. Need to rebase, and upstream.
- OFI BTL changes need to get upstreamed.
- Weeks for MTL
Edgar atomicity issue for OMPIO. Not sure if it's a full feature, but need to have on radar.
- ETA: a few days after Edgar finds time. 2-3 weeks.
Any other big features?
Branch Date will discuss next week.

New Topics

Issue 7486

How to implement so that ./configure --help presents all configure options to users?
Didn't get to 1/25

Issue 8321

Process returns wrong result unless pml is ^ucx.
Looks like the user is trying to use UCX with TCP inside of a container.
- Not sure how well tested UCX+TCP
- If users are using TCP sockets, why is selection picking UCX instead of OB1/TCP_BTL
Should be straight forward to chase down, and
- Possibly an issue with collective and UCX in this runmode.

Setup Github Teams

Jeff can setup so we have single point of contact in github, that many members of organizations can watch
Don't go crazy to start, just setup a few

Longer Term discussions

ROMIO Long Term (12/8)

What do we want to do about ROMIO in general.
- OMPIO is the default everywhere.
- Giles is saying the changes we made are integration changes.
  - There have been some OMPI specific changes put into ROMIO, meaning upstream maintainers refuse to help us with it.
  - We may be able to work with upstream to make a clear API between the two.
- As a 3rd party package, should we move it upto the 3rd party packaging area, to be clear that we shouldn't make changes to this area?
Need to look at this treematch thing. Upstream package that is now inside of Open-MPI.
Might want a CI bot to watch a set of files, and flag PRs that violate principles like this.

Doc update

PR 8329 - convert README, HACKING, and possibly Manpages to restructured text.
- Uses https://www.sphinx-doc.org/en/master/ (Python tool, can pip install)
- Has a built from this PR, so we can see what it looks like.
- Have a look. It's a different approach to have one document that's the whole thing.
  - FAQ, README, HACKING.
Do people even use manpages anymore? Do we need/want them in our tarballs?

How's the state of https://github.com/open-mpi/ompi-tests-public/

Putting new tests there
Very little there so far, but working on adding some more.
Should have some new Sessions tests

What's going to be the state of the SM Cuda BTL and CUDA support in v5.0?

What's the general state? Any known issues?
AWS would like to get.
Josh Ladd - Will take internally to see what they have to say.
From nVidia/Mellanox, Cuda Support is through UCX, SM Cuda isn't tested that much.
Hessam Mirsadeg - All Cuda awareness through UCX
May ask George Bosilica about this.
Don't want to remove a BTL if someone is interested in it.
UCX also supports TCP via CUDA
PRRTE CLI on v5.0 will have some GPU functionality that Ralph is working on
Update 11/17/2020
- UTK is interested in this BTL, and maybe others.
- Still gap in the MTL use-case.
- nVidia is not maintaining SMCuda anymore. All CUDA support will be through UCX
- What's the state of the shared memory in the BTL?
  - This is the really old generation Shared Memory. Older than Vader.
- Was told after a certain point, no more development in SM Cuda.
- One option might be to
- Another option might be to bring that SM in SMCuda to Vader(now SM)
Discussion on:
- Didn't get to this week. :(
- Draft Request Make default static https://github.com/open-mpi/ompi/pull/8132
- One con is that many providers hard link against libraries, which would then make libmpi dependent on this.
- Non-Homogenous clusters (GPUs on some nodes, and non-GPUs on some other)

Video Presentation

ECP Community days ( March 30-April 1st )
- David Bernholdt and/or George Bosilica
- Each day 90 minute time slots.
- Get proposal in by this Friday.

WeeklyTelcon_20210125

Open MPI Weekly Telecon ---

Attendees (on Web-ex)

not there today (I keep this for easy cut-n-paste for future notes)

4.0.x

v4.1

Open-MPI v5.0

What's the state of ULFM (PR 7740) for v5.0?

New Topics

Issue 7486

Issue 8321

Setup Github Teams

Longer Term discussions

ROMIO Long Term (12/8)

Doc update

How's the state of https://github.com/open-mpi/ompi-tests-public/

What's going to be the state of the SM Cuda BTL and CUDA support in v5.0?

Video Presentation

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!