Skip to content

WeeklyTelcon_20170523

Geoffrey Paulsen edited this page Jan 9, 2018 · 1 revision

Open MPI Weekly Telcon


  • Dialup Info: (Do not post to public mailing list or public wiki)

Attendees

  • Geoff Paulsen
  • Brian Barrett
  • Edgar Gabriel
  • Artem Polyakov
  • Jeff Squyres (Cisco)
  • Geoffroy Vallee (ORNL)
  • Howard
  • Josh Hursey
  • Joshua Ladd
  • Murali (LLNL)
  • Sylvain Jeaugey
  • Todd Kordenbrock

Agenda

1.10.7

  • Tagged and released last week.
  • Tag Kerfuffle:
    • It was originally tagged in the wrong place (in master).
    • It was deleted, and recreated in the correct place.
    • HAVE disallowed forced pushes on master. (Don't know if it applies to tags or not).
    • Please delete your old repos and use a new one (with this new tag) to make things sane again.

2.0.3

  • We still have some ARM issue
    • What does our distro-repackagers for example Fedora / Debian do?
    • If distro repackagers see that 32bit isn't working, do they still release 64bit?
  • Still planning to ship v2.0.3 despite Pasha and Oreon's Issue 2643.
    • pinged him a while ago, and no progress on this.
  • Current schedule is start RCs May 23rd.
    • Need to update News and .so versions. Maybe May 24th.
  • There are a lot of issues on v2.0.x. We'd like to move these to either v2.1.x or v3.x Do we need to fix them in v2.0.x? or beyond?
    • EVERYONE please review open v2.0.x Issues.
    • Close them if it's already addressed.
    • Move them if needed.
  • Someone commented on a Closed Issue 1162 - looks like we missed a 1.10 bugfix in 2.1.1
  • A bunch that still need reviews.
  • Issue 3442 - 32bit builds are busted, probably affects v2.1.x also.
    • Could be exotic architecture issue, or possibly just our CMA glue isn't right. CMA seems to be masking the issue?
  • v3.x update to v3.0.x change send out an email this afternoon, another go this weekend.
    • Will invalidate any PRs to the v3.x that are still open, so those will need to be re-PRed.
  • When we did v2.x we pulled out Checkpoint Restart out of master, and then remove it from v3.x/v3.0.x also.
    • Brian will do this after the rename.
  • Schedule for v3.0.0
    • We're behind.
    • Not sure if we're ready for a CR when the runtime isn't ready.
  • PMIx update (via Josh as Ralph is out):
    • PMIx team has agreed to release the current state of PMIx master as v2.0, and will release the cross-version support as v2.1 since it doesn't represent a change to the "standard". There is one more bug fix we want to get into it and then Ralph will roll the release candidate.

  • Amazon is not testing SuSE right now, but they are testing SLES.
  • Amazon is about to add ARM with Pasha (ARM). - isn't stock ARMv8, 64bit.
  • Amazon is trying to track down a testing failure, that they're seeing. Probably not the specific PR that failed, but mpirun hung.
  • Don't have a great way to bot:retest:ARM because of the way that Jenkins works.
    • Don't stress about it too much, just :retest: all of the platforms.
  • Edgar - UH Platform should be back today.

MTT Dev status:


Exceptional topics

  • Spectrum MPI customers asking for help on Open MPI mailing lists.
    • IBM will identify the Spectrum mailing list and send customers there for support issues.
    • Sorry for the confusion.
    • this particular issue boiled down to an installation issue, or an IBM Pami PML specific issue.
    • Community members graciously said not to worry about it, but to please have someone from IBM regularly monitor the mailing lists.
  • Face2Face Meeting-2017-07
    • Date: July 11-13 (9am Tuesday - noon on Thursday.
    • Cisco has booked space in Chicago.
      • Cisco has reserved some space right next to O Hare (can get shuttle to hotel).
        • we have met there before.
      • Jeff will come in Monday evening.

Status Updates:

  • Geoffroy Vallee -
    • Been running on some IBM system with master / PGI, but it's having failures submitting back to the MTT database. Josh Hursey (IBM) is helping.
    • having problems getting the .tar (wget or curl fails).
    • Running into a few issues with PGI and LSF.
  • IBM Spectrum MPI v10.1.1 - based on Open MPI v2.x moving to invite only mode.
    • Working to PR bugfixes that hasn't gone upstream yet.

Status Update Rotation

  1. Cisco, ORNL, UTK, NVIDIA
  2. Mellanox, Sandia, Intel
  3. LANL, Houston, IBM, Fujitsu

Back to 2017 WeeklyTelcon-2017

Clone this wiki locally