-
Notifications
You must be signed in to change notification settings - Fork 859
WeeklyTelcon_20221017
Geoffrey Paulsen edited this page Oct 19, 2022
·
1 revision
- Dialup Info: (Do not post to public mailing list or public wiki)
- Geoffrey Paulsen (IBM)
- Jeff Squyres (Cisco)
- Austen Lauria (IBM)
- Brendan Cunningham (Cornelis Networks)
- Christoph Niethammer (HLRS)
- Edgar Gabriel (UoH)
- Harumi Kuno (HPE)
- Howard Pritchard (LANL)
- Jan (Sandia)
- Joseph Schuchart
- Thomas Naughton (ORNL)
- Todd Kordenbrock (Sandia)
- Tommy Janjusic (nVidia)
- William Zhang (AWS)
- Akshay Venkatesh (NVIDIA)
- Brian Barrett (AWS)
- David Bernhold (ORNL)
- Josh Fisher (Cornelis Networks)
- Artem Polyakov (nVidia)
- Aurelien Bouteiller (UTK)
- Brandon Yates (Intel)
- Charles Shereda (LLNL)
- Erik Zeiske
- George Bosilca (UTK)
- Hessam Mirsadeghi (UCX/nVidia)
- Jingyin Tang
- Josh Hursey (IBM)
- Marisa Roman (Cornelius)
- Mark Allen (IBM)
- Matias Cabral (Intel)
- Matthew Dosanjh (Sandia)
- Michael Heinz (Cornelis Networks)
- Nathan Hjelm (Google)
- Noah Evans (Sandia)
- Raghu Raja (AWS)
- Ralph Castain (Intel)
- Sam Gutierrez (LLNL)10513
- Scott Breyer (Sandia?)
- Shintaro iwasaki
- Xin Zhao (nVidia)
- libevent CVE was a red herring.
- v4.1.5
- Schedule: targeting ~6 mon (Targeting November, looking RC next week or two)
-
New v5.0.0 blocker LSF issue came in. 10943
-
NEW - Discuss CUDA + OFI on v5.0.x
-
main
runs fine, but in OMPI v5rc8 was in OFI MTL code, that was removed as part of the accelerator.- Was this a bug that was fixed as part of the new framework?
- The bug is that if configured with CUDA support in v5.0.x before new framework, but ran on
a system without GPUs, the CUDA part of OFI MTL thought it had to deregister buffers
not related to the CUDA device.
- Saw this in IMB's Alltoall.
- The bug is that if configured with CUDA support in v5.0.x before new framework, but ran on
a system without GPUs, the CUDA part of OFI MTL thought it had to deregister buffers
not related to the CUDA device.
- Yes, did fix some issues in OFT MTL code, and proper device to host copy.
- Was this a bug that was fixed as part of the new framework?
-
-
Jenkins - make tarball issue.
- RPM builds dont work in Jenkins on v5.0.x
- Doesn't block RC, but DOES block
- Updated PMIx / PRTE submodule pointers on v5 yesterday.
- RPM builds dont work in Jenkins on v5.0.x
-
HAN/Adapt -
- Joseph was able to dump trees that they use.
- He will open a ticket on performance differences, need some help understanding.
- He'd like to enable HAN if detect
- If --rank by or something is not a linear sequence over the nodes by the ranks,
- Then bump the priority of HAN.
- If it turns out the Verification of ranks in communicator, might want to do it at Comm create time and set a flag.
- Yes this is how Joseph is trying to do it.
- Any benefit in
-
Adapt is supposed to be for noisy clusters/applications.
-
Symbol Pollution - PRs posted against main, and will PR once merged.
-
Docs - Remaining blocking issue (besides above) for v5.0.0
-
mpirun --help
is OUT OF DATE. - A number of doc issues open.
- See https://github.com/open-mpi/ompi/projects/3 for more info.
-
- Merged to main, and to v5.0.x
- Will put out a new v5.0.0rc9 to include this.
- Now that this is merged, What do we expect to see in the configury summary info?
- Howard saw: CUDA support NO, ROCM support NO.
- Howard will file an issue. Probably just the configure variables this is based on, have changed.
- CUDA and ROCM components now directly link against those libraries.
- Perhaps the Configure MACROs need to be updated.
- will file an issue.
- Delayed a few weeks due to busyness.
- We're probably not getting together in person anytime soon.
- So we'll send around a doodle to have time to talk about our rules.
- Reflect the way we worked several years ago, but not really right now.
- we're to review the admin steering committee in July (per our rules):
- we're to review the technical steering committee in July (per our rules):
- We should also review all the OMPI github, slack, and coverity members during the month of July.
- Jeff will kick that off sometime this week or next week.
- In the call we mentioned this, but no real discussion.
- Wiki for face to face: https://github.com/open-mpi/ompi/wiki/Meeting-2022
- Might be better to do a half-day/day-long virtual working session.
- Due to company's travel policies, and convenience.
- Could do administrative tasks here too.
- Might be better to do a half-day/day-long virtual working session.
- Open MPI missed submitting request for BoF this year.
- MPI Forum will be presenting.