Skip to content

WeeklyTelcon_20220426

Geoffrey Paulsen edited this page May 3, 2022 · 1 revision

Open MPI Weekly Telecon ---

  • Dialup Info: (Do not post to public mailing list or public wiki)

Attendees (on Web-ex)

  • Akshay Venkatesh (NVIDIA)
  • David Bernhold (ORNL)
  • Geoffrey Paulsen (IBM)
  • Hessam Mirsadeghi (UCX/nVidia)
  • Howard Pritchard (LANL)
  • Jeff Squyres (Cisco)
  • Joseph Schuchart
  • Josh Fisher (Cornelis Networks)
  • Josh Hursey (IBM)
  • Thomas Naughton (ORNL)
  • Todd Kordenbrock (Sandia)
  • Tommy Janjusic (nVidia)

not there today (I keep this for easy cut-n-paste for future notes)

  • Artem Polyakov (nVidia)
  • Aurelien Bouteiller (UTK)
  • Austen Lauria (IBM)
  • Brandon Yates (Intel)
  • Brendan Cunningham (Cornelis Networks)
  • Brian Barrett (AWS)
  • Charles Shereda (LLNL)
  • Christoph Niethammer (HLRS)
  • Edgar Gabriel (UoH)
  • Erik Zeiske
  • Geoffroy Vallee (ARM)
  • George Bosilca (UTK)
  • Harumi Kuno (HPE)
  • Joshua Ladd (nVidia)
  • Marisa Roman (Cornelius)
  • Mark Allen (IBM)
  • Matias Cabral (Intel)
  • Matthew Dosanjh (Sandia)
  • Michael Heinz (Cornelis Networks)
  • Nathan Hjelm (Google)
  • Noah Evans (Sandia)
  • Raghu Raja (AWS)
  • Ralph Castain (Intel)
  • Sam Gutierrez (LLNL)
  • Scott Breyer (Sandia?)
  • Shintaro iwasaki
  • William Zhang (AWS)
  • Xin Zhao (nVidia)

v4.1.x

  • 4.1.4 - Looking to do a quick turnaround to get coll_ucc component released.
    • Brian should tag an rc today.
  • A bug, William added a check to compare fabric names.
    • Thinks the check is correct, but fabric names might not be?
    • Hopefully today or tomorrow he'll have it fixed.

v5.0.x

  • Brought in about 4 PRs since last RC. See what we get in by end of the week for an rc7
  • Remaining major issues
    • orte -> prrte docs
    • External dependencies being slurped into mpirun.
    • Show Help aggregate from v4 -> v5.
    • Ralph has a laundry list for prrte v2.1 that sound like we need fixed for OMPI v5.0.0
      • And sound like a lot of work
      • Two that stand out:
        • mpirun (no DVM) - issue in how resources are being allocated for anything more complex than -np 2
          • Markalle opened a prrte issue on Power9
          • Need to understand this disconnect
        • Bouncing data back between PMIx and PRRTE repeatedly.
      • opal_show help not being aggregated correctly. Possibly above?
      • Java binding issue isn't a blocker, but represents a bigger disconnect between us an PRRTE.
  • Issue #10252 - Brian and Jeff are working on an mpirun requirements doc.
    • Brian is close to posting an 85% solution. *
    • Will need to beef up the mpirun before the exec of prterun.
    • No hook in prrte for setting up path in back end daemons
    • Brian opened an Issue #10252
    • Brian wrote up a first cut at it, and Jeff hasn't yet read it.
    • He will paste it into the Issue later today or tomorrow.
      • v5.0.0 blocker items.
  • Jeff gave the packagers a heads up that we're now including the html docs and they might want to package that.

v4.0.x

  • No plan for update
  • alltoallv patches - Patch went into v4.1.x that

Main branch

  • No longer master nightly tarballs, so if MTT isn't updated, you're not running recent code.
  • dropping reminders about removing master branch from forks.

MTT

  • IBM asked nVidia if they would take over PGI compiler build/testing.
    • nVidia is still looking into it.

Face-to-face

Clone this wiki locally