-
Notifications
You must be signed in to change notification settings - Fork 859
WeeklyTelcon_20230606
- Dialup Info: (Do not post to public mailing list or public wiki)
- Geoff Paulsen
- Jeff Squires (CISCO)
- Brendan Cunningham
- Christoph Niethammer HLRS
- David E. Bernholdt
- Edgar Gabriel (AMD)
- Joseph Schuchart
- Luke Robison (Amazon)
- Quincey (AWS)
- Thomas Huber
- Thomas Naughton
- Todd Kordenbrok
-
Main needs to submodule pointer updates
- https://github.com/open-mpi/ompi/issues/11724
- Austen will PR soon
-
#11532 - No progress yet
-
#11730 he has a large set of changes to mpirun and a few changes to schizo
- Split between prterun and mpirun - command line args propagating is hard to describe, but putting them in, has to clone them. No way to make this community
- Clone the text?
- Probably best/easiest solution
- Include a comment from where it came
- Another idea might be to add this common text into prrte and #include via rst
- Could just pull this from the submodule.
-
Concerned about all of these open issues
- If we could get a few tickets closed, we could be done soon
- If you know it, might be easy to close.
- Broke them down into smaller tasks.
- Pick one, assign yourself
-
Doc Issues: https://github.com/open-mpi/ompi/projects/3
- Some of the smaller issues, we might be able to say that an item is ToBeDone. Note: Sorry we haven't documented this yet.
-
#11726 -N bind ppr:X:node, map by package (socket), or core
- What we've confirmed is that there is a change to the way that binding works if you just specify
-N
- Seems like we try to change the schizo component so that we maintain behavior from v4 to v5.
- With this, we can decide what to do.
- Luke
- What we've confirmed is that there is a change to the way that binding works if you just specify
- #11722 - Cannot build+install with out of source builds (VPATH)
- Possible blocker, need to update submodule pointers.
- only on main
- main needs submodule update - Austen
- Possible blocker, need to update submodule pointers.
- No updates
Current issues:
-
PMIX v4.2 async modex issue: https://github.com/openpmix/openpmix/issues/3077
- Work around: -x PMIX_MCA_gds=hash or enable opal_pmix_collect_all_data
- Need to up the timeout, fix in OMPI before PMIX_Get, increase timeout as a function of scale with user override.
- Likely that the original issue is missing an additional variable for async modex. to ompi_pml_base_check_pml
- New parameter exists for v5.0.x MUST be documented,
-
MCA Params issues are biggest issues now - no new updates.
- https://github.com/openmpi/ompi/issues/11532
- https://github.com/openpmix/prrte/issues/1731
- Plan is to have 2 of the 3 fixes for v5.0.0, 3rd issue can wait for 5.0.x
- Quincy assigned, working on docs first.
-
Need to cherry-pick NIC selection (distances PR fixes) to v5.0.x
- Several PRs will go into main, including coverity fixes.
- Amir to open up a v5.0.x PR to track all main commits and cherry-pick to v5.0.x when finished.
- Pending review -
- Will create initial v5.0.x PR as a pre-PR for the NIC selection: needs review
-
UCX and enable mca dso do not mix issue: https://github.com/open-mpi/ompi/issues/11632
-
Issue #11532: mca_base_param_files option is no longer read
- PMIX command line parsing issue fixed the first stage completed, next stage fix over the next few days.
-
PR 11681 Propagate the error from callback *Legit bug fixed by George but introduced behavior change, need community review.