-
Notifications
You must be signed in to change notification settings - Fork 859
WeeklyTelcon_20170919
Geoffrey Paulsen edited this page Jan 9, 2018
·
1 revision
- Dialup Info: (Do not post to public mailing list or public wiki)
- Geoff Paulsen (IBM)
- Ralph Castain (Intel)
- Howard
- Brian Barrett
- Todd Kordenbrock
- Jeff Squyres (Cisco)
- Joshua Hursey
- David Bernholdt (ORNL)
- Geoffroy Vallee (ORNL)
- Artem (Mellanox)
- Mohan
- Nathan Hjelm
- Thomas Naughton
Review All Open Blockers
Review v2.0.x Milestones v2.0.4
- Going to switch v2.0.x to only Critical fixes only!
- Only Critical fix we know of now is MAdvise fix.
- Ask people to move to v2.1.x or v3.0.0
Review v2.x Milestones v2.1.2
- v2.1.2
- Allinia still seeing intermittent issues in v2.1.2 release candidate Issue 3660
-
Issue 2614
- Targeting v2.1.x - Not sure if it affects v3.0.x or master
- Nathan may have a fix.
- Mark Allen can help
Review v3.0.x Milestones v3.0
- v3.0.1 - Opened the branch for bugfixes Sep 18th.
- Looking at mid-October.
- SuSE reported some issue with a combination of flags that causes issues. Giles is looking at. Will take a while.
- Nightlies have all switched.
- ortedvm is broken on v3.0.0
- could do a specific fix for v3.0.0, but it's already working in latest PMIx / master, but that would require upgrading to new PMIx and changes through all of orted
- Do we want to fix in v3.0.1? It is already in master, and will be release in v3.1.
Review v3.1.x Milestones v3.1](https://github.com/open-mpi/ompi/milestone/27)
- Plan to branch from Master Spetember 19.
- gives us 6 weeks to stabilize and release before supercomputing.
- Schedule for NEXT v3.1 release (Branch and Ship)
- Would like to have all features into master before we branch for v3.1
- Ralph is working on tool connection.
- Told the debugger community to move away from MPI_R to PMIx for standard attaching mechanism.
- This is the way of doing this, first cut for debuggers to start their development work.
- Not that critical to be in an OMPI release, because could get them functionality via other channels.
- RMLFI component is now complete. > 32nodes it launches much faster. Sockets or PSM2? Should work with ugenie, psm2, and sockets (but don't get any benefit).
- Amazon has something in review for v3.1
- No whitelist for v3.1. v3.0.0 was transition, and no more whitelists.
- New Features in master:
- mellanox added some stuff
- Howard added some code for tools.
- Want to remove Reachable framework in v3.0.0 since it's very broken, and not used, and can't backport v3.1.x
- Amazon wants to put Reachable Framework back in PR 4225 merged into master before we branch v3.1.x
- Some bugs in the TCP btl code, hoping to have it USING Reachable before we branch for v3.1.x.
- Not sure if we can fix TCP without Reachable framework.
- Will turn around and create an RC as soon as we can.
Review Master Master Pull Requests
- Single digit number of fails.
- Artem will look (sometime) at out of resources in dstore
- dynamic disconnect test needs to run with --oversubscribe (otherwise will fail).
- argv null tests for Fortran spawn. All failing with executable can't be found.
- Howard having issues with reaching out and getting ID from MTT. Josh isn't sure.
- Brian tried some new ways of building the tarball, but it failed... so delayed until Thanksgiving.
- Root filesystem on webserver failed, because jenkins failed.
- Jenkins reachout plugin is terrible, so having the clients reach out to the server is more stable.
- Brian is working with Nathan's MAC. Not sure if this approach would work for Cray machines.
- Howard would like to get this setup. Brian can send instructions.
Review Master MTT testing
- Ralph proposed to have a bot that could scan issues, and close issues if no action in some time.
- A bit of concern about auto-closing (losing visibility of legitimate issues)
- Ticket shaming seems to work.
- C++ removal - https://github.com/open-mpi/ompi/pull/1389
- If we Pull, it would cause a major version bump, but don't want this to drive a major version bump.
- Need to see if Attributes are MT - IBM will see if we have any tests to audit.
- Jan / Feb
- Possible locations: San Jose, Portland, Albuquerque, Dallas
- Mellanox, Sandia, Intel
- LANL, Houston, IBM, Fujitsu
- Amazon,
- Cisco, ORNL, UTK, NVIDIA