WeeklyTelcon_20220621

Open MPI Weekly Telecon ---

Dialup Info: (Do not post to public mailing list or public wiki)

Attendees (on Web-ex)

Akshay Venkatesh (NVIDIA)
Austen Lauria (IBM)
Brendan Cunningham (Cornelis Networks)
Christoph Niethammer (HLRS)
David Bernhold (ORNL)
Edgar Gabriel (UoH)
Geoffrey Paulsen (IBM)
George Bosilca (UTK)
Harumi Kuno (HPE)
Hessam Mirsadeghi (UCX/nVidia)
Howard Pritchard (LANL)
Joseph Schuchart
Josh Fisher (Cornelis Networks)
Thomas Naughton (ORNL)
Todd Kordenbrock (Sandia)
Tommy Janjusic (nVidia)
William Zhang (AWS)

not there today (I keep this for easy cut-n-paste for future notes)

Artem Polyakov (nVidia)
Aurelien Bouteiller (UTK)
Brandon Yates (Intel)
Brian Barrett (AWS)
Charles Shereda (LLNL)
Erik Zeiske
Geoffroy Vallee (ARM)
Jeff Squyres (Cisco)
Josh Hursey (IBM)
Joshua Ladd (nVidia)
Marisa Roman (Cornelius)
Mark Allen (IBM)
Matias Cabral (Intel)
Matthew Dosanjh (Sandia)
Michael Heinz (Cornelis Networks)
Nathan Hjelm (Google)
Noah Evans (Sandia)
Raghu Raja (AWS)
Ralph Castain (Intel)
Sam Gutierrez (LLNL)
Scott Breyer (Sandia?)
Shintaro iwasaki
Xin Zhao (nVidia)

v4.1.x

v4.1.5
- Schedule: targeting ~6 mon (Nov 1)
- No driver on schedule yet.
Will add a WIP label to https://github.com/open-mpi/ompi/pull/10448
- Plan to merge along with another commit that's not yet on main.

v5.0.x

Schedule:
- PRRTE is targeting late summer.
- Newish issue regarding the partition communication features.
  - Since it's a new feature try to get these in as well.
Only a small number of changes on v5.0.x branch.
- Some docs
main has also been quiet this week.
- New Issues opened 10480 - Need to be done prior to release.
- New Issues opened 10481 - Need to be done prior to release.
- A few other issues.
- mpirun -v on v5.0.x returns prrte version.
Does anyone still care about the min-dist mapper? Considering dropping this is PRRTE.
- Mellanox developed and will reply.
Open an issue to track it?

Main branch

Discussed Accelerator framework (see below)
Discussed atomics PRs (see below)

Accelerator framework

Tommy got some discussion that they do have customers who use the sm_cuda component.
William will try to update sm_cuda component and convert it into the framework.
- Akshay had some comments.
- Mellanox commits to testing these changes.
Want to see what priority to set HAN and Adapt by default and what priority.
- Depends on scale and message sizes.
- Not just the message size, but also the ranking affects the performance
- Tuned, the communications go between ranks based on tree ignoring ranking on nodes.
  - Han rearranges the ranks to allow for optimal approach at each level.
- Han should be faster and more stable because
Adapt deals with asyncronous order of arrival to collective.
- Tommy saw some segv with Adapt, so he just
- logic is very similar to tuned with tree. But much more async
  - really adapt based on which arrives

Joseph posted two atomics PRs

10492 and link to 10487
C11 atomics makes every atomic sequentially
But we have many code-paths that we don't want this.
- If you don't use threads, or if you do use thread but do initializations, we don't want this.
First thought on this is to relax load and stores.
- But going through code and figuring out where to
- So second PR just removes _atomic for C11.
Difference measures was 20-25% for local messages.
10492 moves us back to where we were before C11 atomics.
Because they're atomic gcc uses exchange (x86)
- and exchange is very expensive even if there's already a lock around it.
- saw this in GCC 9, but not 10, but then again in 11.
Compiler doesn't know
- OPAL_THREAD macros. no way to tell it to avoid it.
Variable is marked with atomic flag. and doing + in thread
objdump an ob1 function.
with _atomic we have no control over memory ordering other than explicit atomic load/store operations...
- This is what first PR does...
Is there a risk with 2nd PR that we might need to add some locks.
- Code we have today has been tested with old flavor, so it should be pretty safe.
- When we write new code, we'll need to
Given the way OPAL_ATOMIC is structured, we hope no one expected an increment was not atomic.

MTT

Face-to-face

Wiki for face to face: https://github.com/open-mpi/ompi/wiki/Meeting-2022
- Should think about schedule, location, and topics.
- Some new topics added this week. Please consider adding more topics.
- Might be better to do a half-day/day-long virtual working session.
  - Due to company's travel policies, and convenience.

WeeklyTelcon_20220621

Open MPI Weekly Telecon ---

Attendees (on Web-ex)

not there today (I keep this for easy cut-n-paste for future notes)

v4.1.x

v5.0.x

Main branch

Accelerator framework

Joseph posted two atomics PRs

MTT

Face-to-face

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!