-
Notifications
You must be signed in to change notification settings - Fork 859
WeeklyTelcon_20170606
Geoffrey Paulsen edited this page Jan 9, 2018
·
1 revision
- Dialup Info: (Do not post to public mailing list or public wiki)
- Geoff Paulsen
- Edgar Gabriel
- Artem Polyakov
- Jeff Squyres (Cisco)
- Howard Pritchard
- Josh Hursey
- Joshua Ladd
- Mohan
- Todd Kordenbrock
- David Bernholdt
- Nathan Hjelm
- Ralph
- Brian Barrett (Amazon)
- Geoffroy Vallee
- Mark Allen (IBM)
- Sylvain Jeaugey
- Thomas Naughton
Review All Open Blockers
- Discuss Progression Issue 3616
- openib progression issue.
- Nathan will try to look at this, this week.
- Not sure where we might block in the callbacks.
- George is re-working progression model, but because we're getting new model, we just need an ugly solution for now.
- Hit this in a non-contiguous one-sided put down in openib via osc_rdma. The accumulate wants to trigger another callback. And then a barrier to get the timing right.
- Nathan thinks he can make the unlock non-blocking in the accumulate lock.
- released June 1st.
- No driver for a v2.0.4 at this time.
- v2.1.1 went out in May
- No Driver for v2.1.2 at this time.
Review Milestones v3.0
- Planning to do v3.0 RC today, but lots of failures in nightly MTTs.
- Cisco killed a bunch, and will re-kick-off a bunch.
- Datatype and Info Key type errors out of IBM tests.
- Amazon false positives because they're direct launching, but don't support dynamic processes in direct launch.
- Howard sent out request for NEWs updates.
- One additional PMIx issue.
- orte, opal and PMIx, threading issue from IBM.
- Some confusion if we have assembly backwards in PMIx 2.0 (Nathan or George)
- Nathan can take a look when he gets into office today.
- only seen evidence in PMIx.
- Issue is in PMIx: https://github.com/pmix/pmix/issues/347
- Ralph sync up with Brian and Howard end of day to hear status of issue, for v3.0 RC.
Review Master Pull Requests
Review Master MTT testing
- Still seeing some 'make check' errors has been fixed.
- IBM still seeing a hang in 'make check' - must be ppc64le specific. No timeout.
- 32bit compiler stuff fixed in pmix fix.
- Geoffroy Vallee - still seeing some problems disabling make check.
- MPI_Send_receive_replace - got fixed.
- Timeouts are all CUDA related - nvidia.
- still there.
- Issue: Redhat stock autoconf (rather than build our own)
- Need a maintainer for rankfile mapper.
- IBM will take up maintaining rankfile mapper from Ralph.
- Intel making lots of progress. Nice features, but not sure how to make the transition.
- .ini files would need to be transitioned across because python doesn't support funclets.
- Does everyone have to transition the same day, or can the transition be one by one.
- Yes, everyone can transition in their own time.
- Face2Face Meeting-2017-07
- Date: July 11-13 (9am Tuesday - noon on Thursday.
- Cisco has booked space in Chicago.
- Cisco has reserved some space right next to O-Hare (can get shuttle to hotel).
- we have met there before.
- Jeff will come in Monday evening.
- Cisco has reserved some space right next to O-Hare (can get shuttle to hotel).
- Amazon - bringing much more testing online, and CI processes.
- v3.0.0 Release work
- Improved Jenkins infrastructure. Hopefully some changes yesterday (in Jenkins setup at Amazon) will make it run a little faster.
- Travis is now officially deactivated. No longer using Travis.
- Amazon
- Cisco, ORNL, UTK, NVIDIA
- Mellanox, Sandia, Intel
- LANL, Houston, IBM, Fujitsu