Skip to content

WeeklyTelcon_20160927

Geoffrey Paulsen edited this page Jul 25, 2023 · 2 revisions

Open MPI Weekly Telcon


  • Dialup Info: (Do not post to public mailing list or public wiki)

Attendees

  • Geoff Paulsen
  • Jeff Squyres
  • Josh Hursey
  • Joshua Ladd
  • Brian Benton
  • Sylvain Jeaugey
  • Artem Polyakov
  • Brad Benton
  • Todd Kordenbrock
  • Nathan

Agenda

Review 1.10

Combining of Repos from ompi-release into ompi repo.

  • milestones fields have been used in two different ways.
    • PR against Master, what should we use milestone field for? which release branch to target?
    • can only set 0 or 1 milestones.
  • No other real fallout of repo combining

Review 2.0.x

  • Wiki

  • Milestones

    • 2.0.2 in preparation
      • All just bugfixes, no time table yet.
      • just progressing.
    • 2.1.0
      • Sept 30th deadline for New Features. Jeff sent email last Wed or Thursday.
      • Mpool Rcache re-write PR2101. Nathan reverted all of the spot fixes, and then applied all of them.
        • Nathan has 2 more PRs for trivial features he'd like to add.
          • Add flag enumerator to mca base - one of the cherry-picks will be much harder if it goes in before.
          • Been on master for many months, but many many cherry-picks, so PLEASE review.
        • Affects every BTL, been couple of different
        • Goals: clear interface between mpool and rcache. supports memkind.
        • orthogonal to memhooks, because only affects internal allocations.
        • confusing, because it used to be very confusing, but NOW is separating this out. All explicit, allocations, and then separate registrations.
      • Was hoping to get 2.1.0 PRs in, before we merge GIT repos.
      • C++ wrappers for OSHMEM - Failed Jenkins but passes by hand. Resolve before merging.
      • One sided
      • -disable
      • PMIx 2.0
        • Ralph is out this week and maybe more.
        • Need to move ahead, and TRY to make progress.
        • Give feedback to Ralph that we should try to keep PRs out there for a couple of days so developers on other side of the world have time to comment.
        • Hard deadline of Friday on PMIx PR _____.
        • Then will pull into ompi master. Test. then PR it to v2.1.0.
      • In Master can use PMIx 2.0 with external. In Master, internal component has already upgraded to 3.0.
      • IBM and Mellanox, along with Nathan and Howard (LANL) will meet to discuss getting this work done quickly for OMPI 2.1.0.
    • Looked at a prototype of the merged GITHUB repo called ompi-all-the-branches
      • Review mechanism is web-only.
      • Blocking on OSHMEM - needs rebasing.
      • Yoda maintenance.
      • Ongoing performance discussion.
      • Most PRs marked as RM approved
      • Discussion on a few other other items
    • Blocker 2.0.2 issues
      • Issue 2075
        • Non-issue since SIGSEGV is not forwarded.
      • Issue 2049
        • Ticket updated
      • Issue 2030
        • MTT seems to be the only place to reproduce
        • Might be a debug build related issue in usage of opal_list_remove_item
      • Issue 2028
        • yoda needs to be updated for BTL 3.0
        • 2.1 will not be released until yoda is fixed
        • Propose: Remove yoda from 2.1, and move to ucx
        • Raises the question: Does it make sense to keep OSHMEM in Open MPI if yoda is removed?
      • Issue 1831
    • Blocker 2.1.0 issues
  • OSHMEM - Yoda Maintenance

    • Want to progress both MPI and OSHMEM in same process, don't want multiple network stacks.
    • Original argument was to use OSHMEM over BTL - to use all network stacks (TCP, SM, OpenIB)
      • 4 years ago, but things changed. Don't really have that anymore, have PMLs and SPMLs.
    • Last week Mellenox proposed moving to UCX.
    • OSHMEM sits on top of MPI layer, since it uses much of it.
    • Over last couple of years, it's been decoupled from MPI, now it's sitting on side.
    • But now it's sitting off on the side, and no-one is interested in maintaining the connection to OPAL support and ORTE. If that's all it's using, there are other projects that share OPAL and ORTE.
    • Only reason to be in repository is because connected at the MPI layer.
    • BUT, When you start OSHMEM, first thing called is OMPI_MPI_Init.
    • Maybe it would help, exactly what in MPI layer OSHMEM is using?
    • OPAL<-ORTE<-OMPI<-OSHMEM dependency chain.
    • Maybe it would help to show where that is.
    • OSHRUN (really ORTERUN), Calls OMPI_MPI_Init. Build an MCA plugin infrastructure on top of that.
    • Can't just slash pieces away.
    • Take advantage of all PMIx, Direct Modex, proc structure, and everything that supports this.
    • According to this PR on Master - OSHMEM has the same proc structure as OMPI, but actually has some MORE at the end of it.
    • What about the Transports? MPI -mxm boils down to libmxm, and so does OSHMEM down to libmxm.
    • Became an issue with BTL 3.0 API change.
    • A number of things, especially over last year, MPI focus and OSHMEM focus. A number of breaks between MPI / OSHMEM, release schedules conflicts.
      • Does it make sense to separate the repositories, or design a way to make it easy to pull between the two projects.
    • Right now there is a regression in the code base.
      • Mellanox can't replace Yoda with UCX in October.
      • Mellanox will fix Yoda for this time (for 2.1.0)
      • Could package UCX along side with other transports and let the market decide.
      • Want to continue this discussion about OSHMEM importance included with Open MPI project.
    • We need to have an important discussion about future of MPI / OSHMEM.
  • Discussion on Giles onexit PR2121

    • Seems okay, not sure what the purpose of it.
    • Need to ask Giles why he wants this.
  • Discussion on configure variable for renaming libmpi to something else. (intended for IBM or other vendors).

    • could so something like --with-ident-string (configure time build / vendor type option)
  • SPI - http://www.spi-inc.org/

    • getting people to approve of these.
    • We'll be on Oct 12th Agenda. Once they formally invite us, then we have 60 days to agree / decline.
    • Works solely on a volunteer basis, so very inexpensive.
    • End of September for soliciting feedback on using SPI.
    • Open MPI will hold a formal vote after we receive the formal invite (in mid-to-late-December?)
  • New Contribution agreement / Consent agreement / Bylaws.

    • Will need a formal vote by members.
    • End of October for discussion of new contributor agreement / bylaws.
    • After that we'll set a date for voting.

New Agenda Items:

Review Master MTT testing (https://mtt.open-mpi.org/)

MTT Dev status:

Website migration

Open MPI Developer's Meeting

  • Date of another face to face. January or February? Think about, and discuss next week.

Status Update Rotation

  1. LANL, Houston, IBM
  2. Cisco, ORNL, UTK, NVIDIA
  3. Mellanox, Sandia, Intel

Back to 2016 WeeklyTelcon-2016

Clone this wiki locally