Skip to content

WeeklyTelcon_20180410

Geoffrey Paulsen edited this page Jan 15, 2019 · 1 revision

Open MPI Weekly Telecon


  • Dialup Info: (Do not post to public mailing list or public wiki)

Attendees

  • Geoff Paulsen
  • Jeff Squyrese
  • Brian
  • Edgar Gabriel
  • Geoffroy Vallee
  • Howard
  • Josh Hursey
  • Nathan Hjelm
  • Thomas Naughton
  • Todd Kordenbrock
  • Xin Zhao

Agenda/New Business

Minutes

Review v2.x Milestones v2.1.3

  • v2.1.4 - Targeting Oct 15th,
    • Merged in a bunch of stuff.
    • One-sided multithreaded bugs that came up.
      • Doesn't feel like it's worth it to fix in v2.1.x, so instead pulled configurey changes from v2.0 to v2.1.x
  • No new news on v2.1.x

Review v3.0.x Milestones v3.0.2

  • v3.0.1 went out the door.
    • Oops, Did not get PMIx Compatibility pieces in embedded PMIx
  • v3.0.2 open for bugfixes. Quick turnaround on this.
    • Shooting for May 1st.
    • Will pre-emptively fix PMIx compatibility pieces to pickup PMIx v1.2.5 clients.
    • This will bring in PMIx compatibility with OMPI client (mpirun/orted/libmpi) from OMPI v2.1.3
  • memkind disable needs to get into v3.0.2, Either taken care of or waiting to be taken care of.
  • PR (fix ppc64-big-Endian) can't merger until 4563 is merged.
    • Thought Nathan was going to fix the hang, and then merge.
    • Given this is the same issue as ARM, where we don't have a block, thought we'd just remove
    • We now understand the problem, and not a silent data corruption, just a hang.

Review v3.1.x Milestones v3.1.0

  • Schedule - ASAP - but blockers keep getting filed.
    • No one seems particularly eager to get it out.
  • Two blockers
    1. One is high level of failures in CISCO MTT. Pretty sure it's not unique to 3.1.x, and happening on v3.0.x
    2. Issue 4857 in some situations, v3.1.x produces mpicc wrappers that can't link correctly.
      • Decided to close as can't replicate.

Review Master Master Pull Requests

  • Nothing new.

Other topics

  • Implications for OpenMPI
    • When you have PMIx client v1.2.3 with server v1.2.3 works. (all testing with itself works)
    • This graph is coming from a PMIx client / server standpoint, and describes
    • Wasn't there some blanket cross-version support statements?
      • v1.2.5, v2.0.3, v2.1.1, v3.0.0
    • How is PMIx dstore represented in this graph? ORTE MCA parameter needed for client/server missmatch
  • There is a 3rd chart to describe what testing should be done.
  • This chart does not describe configuring with external PMIx, and compatibility.
    • Containers and externals are different, to be discussed later.
  • Need to figure out how to discuss this with Users.
    • Perhaps discussing compatibilities between user's tools (Orte / slurm / mpirun / Debuggers / etc)
  • one of the things good about PMI v1 or v2, is that their interface stayed the same for years.
    • Well, also PMIx supporting multiple "levels" the message is no longer "use PMI v1/v2 everywhere... there are various levels of support / compatibility everywhere.

MTT / Jenkins Testing Dev

  • IBM CI is back up
  • Cisco and IBM MTT didn't trigger last night.

When should we branch v4.0?

Oldest PR

Oldest Issue


Status Updates:

Status Update Rotation

  1. Mellanox, Sandia, Intel
  2. LANL, Houston, IBM, Fujitsu
  3. Amazon,
  4. Cisco, ORNL, UTK, NVIDIA

Back to 2018 WeeklyTelcon-2018

Clone this wiki locally