-
Notifications
You must be signed in to change notification settings - Fork 859
WeeklyTelcon_20210601
Geoffrey Paulsen edited this page Jul 5, 2021
·
2 revisions
- Austen Lauria (IBM)
- Brendan Cunningham (Cornelis Networks)
- Brian Barrett (AWS)
- David Bernholdt (ORNL)
- Edgar Gabriel (UH)
- Geoffrey Paulsen (IBM)
- Hessam Mirsadeghi (NVIDIA))
- Howard Pritchard (LANL)
- Jeff Squyres (Cisco)
- Joseph Schuchart (HLRS)
- Matthew Dosanjh (Sandia)
- Sam Gutierrez (LANL)
- Todd Kordenbrock (Sandia)
- Tomislav Janjusic (NVIDIA)
- William Zhang (AWS)
- Akshay Venkatesh (NVIDIA)
- Artem Polyakov (NVIDIA)
- Aurelien Bouteiller (UTK)
- Brandon Yates (Intel)
- Charles Shereda (LLNL)
- Christoph Niethammer (HLRS)
- Erik Zeiske (HPE)
- Geoffroy Vallee (ARM)
- George Bosilca (UTK)
- Harumi Kuno (HPE)
- Josh Hursey (IBM)
- Joshua Ladd (NVIDIA)
- Marisa Roman (Cornelius)
- Mark Allen (IBM)
- Matias Cabral (Intel)
- Michael Heinz (Cornelis Networks)
- Nathan Hjelm (Google)
- Naughton III, Thomas (ORNL)
- Noah Evans (Sandia)
- Raghu Raja (secret startup)
- Ralph Castain (Intel)
- Scott Breyer (Sandia?)
- Shintaro iwasaki
- Xin Zhao (NVIDIA)
-
Will roll v4.0.6 rc today
-
We'll do one more RC, and then get a final v4.0.6 out.
-
Where are we on pack/unpack with long and long double
- only external32
- This worked before, but not sure
-
8918 - pack/unpack with external32
-
8818 - checking if
-
Brian thinks Issue 8990 would also apply to v4.0.x
- with-libevent=/usr (Debian packaging does), we add a -L/usr to wrapper output, and put all of the -L to find deps, before -L to libmpi.so, and if there is an ompi in /usr/lib as well,
- Shooting for end of August
- No driver to rush, so now just in bugfix phase.
- Unscheduled RC
- PR 9014 - new blocker.
- fix should just be a couple of lines of code... hard to decide what we want.
- Ralph, Jeff and Brian started talking.
- Need some configury changes in before we RC.
- Issue 8850, 8990 and more
- Brian will file 3-ish issues
- One is configure pmix
- Dynamic Windows fix in for UCX.
- Any update on debugger support?
- Need some documentation that Open MPI v5.0 supports PMIx based debuggers, and that if
- MPIR Shim - pushed up fixes, and enabled CI.
- Could add it to some more CI, to ensure that PMIx doesn't break
- IBM is working on some CI testing with MPIR (typically very brittle)
- Need some guidance on pmix version.
- Right not, probably not a big deal, but perhaps in 2 years when we have 3 release branches with different pmix versions on different release branches, it might make sense to do open-mpi CI testing.
- Shouldn't be too much work to do.
- UCC coll component updating to just set to be default when UCX is selected. PR 8969
- Intent is that this will eventually replace hcoll.
- PR 8998 - MPIPy -
- In shift to PRRTE, --oversubscribe is NOT being handled. If you have more procs than slots on a node, internal oversubscribe var is not yet being set.
- Jeff will look at.
- Mellanox hasn't been reporting for a while. Tommi will follow up.
- Jeff did some work on Cisco MTT.
- There are a bunch of one-sided issues across node.
- Austen and Jeff looking into.
- Narrowed it down to strange results from MPI_Comm_split
- Local Peers value appears to be set wrong under PRRTE
- Joseph see when he installed hwloc in installation path, which leads to warnings if using another hwloc.
- We changed how all of this worked a few weeks ago.
- We shouldn't be installing one unless we can't find an external one.
- Problem is if you link the application to a different hwloc, it now complains.
- This has always been true, we just warn now. Don't do this.
- Austen filed a couple of issues from MTT.
- No discussion
- No update
- No discussion.