Skip to content

Releases: openucx/ucx

v1.5.2-rc1

04 Jun 18:02
b03b274
Compare
Choose a tag to compare
v1.5.2-rc1 Pre-release
Pre-release

Bugfixes:

  • Fix segfault when libuct.so is reloaded - issue #3558
  • Fix ucx_info crash when printing configuration alias
  • Fix static checker errors

v1.6.0-RC1

29 May 21:30
cf5f219
Compare
Choose a tag to compare
v1.6.0-RC1 Pre-release
Pre-release

Features:

  • Modular architecture for UCT transports
  • ROCm transport re-design: support for managed memory, direct copy, ROCm GDR
  • Random scheduling policy for DC transport
  • Optimized out-of-box settings for multi-rail
  • Added support for OmniPath (using Verbs)
  • Support for PCI atomics with IB transports
  • Reduced UCP address size for homogeneous environments

Bugfixes:

  • Multiple stability and performance improvements in TCP transport
  • Multiple stability fixed in Verbs and MLX5 transports
  • Multiple stability fixes in UCM memory hooks
  • Multiple stability fixes in UGNI transport
  • RPM Spec file cleanup
  • Fixing compilation issues with most recent clang and gcc compilers

Tested configurations:

  • RDMA: MLNX_OFED 4.5, distribution inbox drivers, rdma-core 22.1
  • CUDA: gdrcopy 1.3.2, cuda 9.2, ROCm 2.2
  • XPMEM: 2.6.2
  • KNEM: 1.1.3

v1.5.1

01 Apr 17:52
7e67a4b
Compare
Choose a tag to compare

Bugfixes:

  • Fix dc_mlx5 transport support check for inbox libmlx5 drivers - issue #3301
  • Fix compilation warnings with gcc9 and clang
  • ROCm - reduce log level of device-not-found message

v1.5.1-RC1

24 Mar 12:57
d038370
Compare
Choose a tag to compare
v1.5.1-RC1 Pre-release
Pre-release

Bugfixes:

  • Fix dc_mlx5 transport support check for inbox libmlx5 drivers - issue #3301
  • Fix compilation warnings with gcc9 and clang
  • ROCm - reduce log level of device-not-found message

v1.5.0

14 Feb 10:53
4185bbd
Compare
Choose a tag to compare

Features:

  • New emulation mode enabling full UCX functionality (Atomic, Put, Get)
    over TCP and RDMA-CORE interconnects which don't implement full RDMA semantics
  • Non-blocking API for all one-sided operations. All blocking communication APIs marked
    as deprecated
  • New client/server connection establishment API, which allows connected handover between workers
  • Support for rdma-core direct-verbs (DEVX) and DC with mlx5 transports
  • GPU - Support for stream API and receive side pipelining
  • Malloc hooks using binary instrumentation instead of symbol override
  • Statistics for UCT tag API
  • GPU-to-Infiniband HCA affinity support based on locality/distance (PCIe)

Bugfixes:

  • Fix overflow in RC/DC flush operations
  • Update description in SPEC file and README
  • Fix RoCE source port for dc_mlx5 flow control
  • Improve ucx_info help message
  • Fix segfault in UCP, due to int truncation in count_one_bits()
  • Multiple other bugfixes (full list on github)

Tested configurations:

  • InfiniBand: MLNX_OFED 4.4-4.5, distribution inbox drivers, rdma-core
  • CUDA: gdrcopy 1.2, cuda 9.1.85
  • XPMEM: 2.6.2
  • KNEM: 1.1.2

v1.5.0 RC2

08 Feb 21:13
fb1f079
Compare
Choose a tag to compare
v1.5.0 RC2 Pre-release
Pre-release

Features:

  • New emulation mode enabling full UCX functionality (Atomic, Put, Get)
    over TCP and RDMA-CORE interconnects which don't implement full RDMA semantics
  • Non-blocking API for all one-sided operations. All blocking communication APIs marked
    as deprecated
  • New client/server connection establishment API, which allows connected handover between workers
  • Support for rdma-core direct-verbs (DEVX) and DC with mlx5 transports
  • GPU - Support for stream API and receive side pipelining
  • Malloc hooks using binary instrumentation instead of symbol override
  • Statistics for UCT tag API
  • GPU-to-Infiniband HCA affinity support based on locality/distance (PCIe)

Bugfixes:

  • Fix overflow in RC/DC flush operations
  • Update description in SPEC file and README
  • Fix RoCE source port for dc_mlx5 flow control
  • Improve ucx_info help message
  • Fix segfault in UCP, due to int truncation in count_one_bits()
  • Multiple other bugfixes (full list on github)

Tested configurations:

  • InfiniBand: MLNX_OFED 4.4-4.5, distribution inbox drivers, rdma-core
  • CUDA: gdrcopy 1.2, cuda 9.1.85
  • XPMEM: 2.6.2
  • KNEM: 1.1.2
  • Multiple bugfixes (full list on github)

v1.5.0 RC1

22 Dec 23:42
02078b9
Compare
Choose a tag to compare
v1.5.0 RC1 Pre-release
Pre-release

Features:

  • Statistics for UCT tag API
  • New emulation mode enabling full UCX functionality (Atomic, Put, Get)
    over TCP and RDMA-CORE interconnects that don't implement full RDMA semantics.
  • Non-blocking API for all one-sided operations. All blocking communication APIs marked
    as deprecated.
  • New client/server connection establishment API
  • Added CUDA support for stream API

Bugfixes:

  • Multiple bugfixes (full list on github)

v1.4.0

30 Oct 16:24
973dfb1
Compare
Choose a tag to compare

Features:

  • Improved support for installation with latest ROCm
  • Improved support for latest rdma-core
  • Adding support for CUDA IPC for intra-node GPU
  • Added support for CUDA memory allocation cache for mem-type detection
  • Added support for latest Mellanox devices
  • Added support for Nvidia GPU managed memory
  • Added support for multiple connections between the same pair of workers
  • Added support large worker address for client/server connection establishment
    and INADDR_ANY
  • Added support for bitwise atomics operations

Bugfixes:

  • Performance fixes for rendezvous protocol
  • Memory hook fixes
  • Clang support fixes
  • Self tl multi-rail fix
  • Thread safety fixes in IB/RDMA transport
  • Compilation fixes with upstream rdma-core
  • Multiple minor bugfixes (full list on github)
  • Segfault fix for a code generated by armclang compiler
  • UCP memory-domain index fix for zero-copy active messages

Tested configurations:

  • InfiniBand: MLNX_OFED 4.2-4.4, distribution inbox drivers, rdma-core
  • CUDA: gdrcopy 1.2, cuda 9.1.85
  • XPMEM: 2.6.2
  • KNEM: 1.1.2
  • Multiple bugfixes (full list on github)

Known issues:

  • #2919 - Segfault in CUDA support when KNEM not present and CMA is active
    intra-node RMA transpor. As a workaround user can disable CMA support at
    compile time: --disable-cma. Alternatively user can remove CMA from UCX_TLS
    list, for example: UCX_TLS=mm,rc,cuda_copy,cuda_ipc,gdr_copy.

v1.4.0 RC2

25 Oct 20:07
bba50b8
Compare
Choose a tag to compare
v1.4.0 RC2 Pre-release
Pre-release

Features:

  • Improved support for installation with latest ROCm
  • Improved support for latest rdma-core
  • Adding support for CUDA IPC for intra-node GPU
  • Added support for CUDA memory allocation cache for mem-type detection
  • Added support for latest Mellanox devices
  • Added support for Nvidia GPU managed memory
  • Added support for multiple connections between the same pair of workers
  • Added support large worker address for client/server connection establishment
    and INADDR_ANY
  • Added support for bitwise atomics operations

Bugfixes:

  • Performance fixes for rendezvous protocol
  • Memory hook fixes
  • Clang support fixes
  • Self tl multi-rail fix
  • Thread safety fixes in IB/RDMA transport
  • Compilation fixes with upstream rdma-core
  • Multiple minor bugfixes (full list on github)
  • Segfault fix for a code generated by armclang compiler
  • UCP memory-domain index fix for zero-copy active messages

Tested configurations:

  • InfiniBand: MLNX_OFED 4.2-4.4, distribution inbox drivers, rdma-core
  • CUDA: gdrcopy 1.2, cuda 9.1.85
  • XPMEM: 2.6.2
  • KNEM: 1.1.2
  • Multiple bugfixes (full list on github)

Known issues:

  • #2919 - Segfault in CUDA support when KNEM not present and CMA is active
    intra-node RMA transpor. As a workaround user can disable CMA support at
    compile time: --disable-cma. Alternatively user can remove CMA from UCX_TLS
    list, for example: UCX_TLS=mm,rc,cuda_copy,cuda_ipc,gdr_copy.

v1.4.0 RC1

15 Oct 18:42
cfb691d
Compare
Choose a tag to compare
v1.4.0 RC1 Pre-release
Pre-release

Features:

  • Improved support for installation with latest ROCm
  • Improved support for latest rdma-core
  • Adding support for CUDA IPC for intra-node GPU
  • Added support for CUDA memory allocation cache for mem-type detection
  • Added support for latest Mellanox devices
  • Added support for Nvidia GPU managed memory
  • Added support for multiple connections between the same pair of workers
  • Added support large worker address for client/server connection establishment
    and INADDR_ANY
  • Added support for bitwise atomics operations

Bugfixes:

  • Performance fixes for rendezvous protocol
  • Memory hook fixes
  • Clang support fixes
  • Self tl multi-rail fix
  • Thread safety fixes in IB/RDMA transport
  • Compilation fixes with upstream rdma-core
  • Multiple minor bugfixes (full list on github)

Tested configurations:

  • InfiniBand: MLNX_OFED 4.2-4.4, distribution inbox drivers, rdma-core
  • CUDA: gdrcopy 1.2, cuda 9.1.85
  • XPMEM: 2.6.2
  • KNEM: 1.1.2
  • Multiple bugfixes (full list on github)

Known issues:

  • #2919 - Segfault in CUDA support when KNEM not present and CMA is active intra-node RMA transpor. As a workaround user can disable CMA support at compile time: --disable-cma. Alternatively user can remove CMA from UCX_TLS list, for example: UCX_TLS=mm,rc,cuda_copy,cuda_ipc,gdr_copy.