-
Notifications
You must be signed in to change notification settings - Fork 860
WeeklyTelcon_20180122
Geoffrey Paulsen edited this page Jan 15, 2019
·
1 revision
- Dialup Info: (Do not post to public mailing list or public wiki)
- Geoff Paulsen
- Jeff Squyres
- akvenkatesh
- Artem
- Brian
- Edgar Gabriel
- Geoffroy Vallee
- Howard
- Josh Ladd
- Josh Hursey
- Matthew Dosanjh
- Mohan
- Todd Kordenbrock
- Nathan
- News: Ralph will not be able to work on Open MPI anymore. He will continue to work on PMIx, but not even the Open MPI PMIx merge.
- Mellanox will step up and help with PMIx and ORTE integration issues.
- IBM can help with bugfixing, but can not own orte.
- Need a v3.1 release engineer to help Brian will send email to devel-core
- Ralph offered to have a brain dump day. Email Brian if interested.
- MPI forum is in Portland in over a month.
- Face2Face -
- Brian will email to see about co-locating Open MPI with PMIx with ORTE.
- if it's not an issue, then resolve next week.
Review All Open Blockers
Review v2.x Milestones v2.1.3
- No chance to look at.
- Pretty quiet, ready to go
Review v3.0.x Milestones v3.0.1
- Schedule: RC2 is actively building now. [50%]
- On 3.x series trying to cut RCs on nightly tarballs.
- Didn't get RC last week
- Will get RC today.
- Blocker on v3.1.x
- PR4516
- May not be a blocker.
- Target v3.0.x in PR4715
- Review required.
- Will Pull in PR4716
-
Issue 4563
- not seeing on little arm boxes here, Jenkins uses --disable-builtin-atomics.
-
Issue 4563
- Comm Spawn - Documentation PR ready or pulled
-
Issue 4509
- We believe this is closed. Asked Nathan to close.
- Issue - hwloc can't handle cuda from a different location
- On Master specifically disabling hwloc cuda.
- External component does NOT disable build, since
- 4677 - hwloc2 WIP Cant get to until the Weekend.
Review v3.1.x Milestones v3.1.0
- SCHEDULE:
- RC2 Early next week.
- Would like https://github.com/open-mpi/ompi/issue/4605 in there.
- RC2 Early next week.
-
BLOCKER:
- OSC monitoring fix (doesn't build with Portals 4)
- PR4523
- waiting review.
- PMIx 2.1 PR4605
- PR4746
- Ralph - there is cleanup issue with PMIx 2.1, but we have cleanup issues today
- Mellanox will help work on this.
- UCX one sided violating PR4688
-
Issue 4303
- Probably just need to build a patch.
- OSC monitoring fix (doesn't build with Portals 4)
Review Master Master Pull Requests
- Issue Issue4686
- Jeff Tried to reproduce and failed.
- Thought HCOLL was an issue, Artem took out, and put back.
- Something going on in there. Possibly atomic related.
- Might need Nathan's attention.
- Someone could try reverting the one change to atomics to see if that caused it.
- Mellanox will try to reproduce after reverting atomic change. Timing issue.
- Dynamic operations, a TON of sigfaults. All in opal_progress, during ompi_sync_wait multi-credit.
- Something is wrong with atomics. Intercomm_create or Spawn.
- Cisco is tickling the most, and will look at.
- Delayed.
- PR4697 Got resolved and merged to master. * Opal Progress change looks good for most interconnects. * TCP performance regression was resolved and merged to master. * Going to PR this into v3.1.x * George is unhappy with this * Don't have any non-OS wrappers for TLS * Master now checks for Cx11 Can we make it default? * Mac Sierra may/maynot even with _Thread_local * Would be nice if we could require Cx11 for v4.0
- Reg-ex expression creation.
- PR4710
- someone created a test and put it in make-check rather than MTT.
- Then made the component static so that don't have to do make install
- Dont think we should be adding tests to make-check
- Question - Is there a Regex library we could use? Reg-ex is hard.
- This is working pretty well, but did add Framework to allow for future components.
- Change behavior of opal_check_package
- Brian will send email to devel
- Make it more explicit when it finds issues
- Issue Issue4423
- When your PR has been accepted into a release branch, please go to the issue, and remove the target of the release branch that it was just merged into. Attempting to automate this in the future.
- New Topic - We currently can't write unit tests against components.
- Some way to say "this unit test is against this component".
- Intel went through and did this internally for orte. Already hosted in public domain.
- Ralph will send link to Brian to take a look.
- Python Client can't report back to database.
- https://github.com/open-mpi/mtt/issues/614
- Josh Hursey will look at.
Review Master MTT testing
- Probably looking at March or early April
- San Jose or Dallas
- Geoff will send out two Doodles for date and time.
- San Jose or Dallas
- Discuss abandoning openib btl.
- LNLL - is no longer paying anyone to maintain openib btl.
- Nathan has a UCX BTL
- ETA on GPU in UCX - basic minus CUDA IPC in test now.
- Any warning message if on iWarp
- What's the roadmap for this? 3.x or 4.x?
- LNLL - is no longer paying anyone to maintain openib btl.
- pushed date to late feb or march.
- Mellanox, Sandia, Intel
- LANL, Houston, IBM, Fujitsu
- Amazon,
- Cisco, ORNL, UTK, NVIDIA