Releases: openucx/ucx
Releases · openucx/ucx
v1.5.2-rc1
Bugfixes:
- Fix segfault when libuct.so is reloaded - issue #3558
- Fix ucx_info crash when printing configuration alias
- Fix static checker errors
v1.6.0-RC1
Features:
- Modular architecture for UCT transports
- ROCm transport re-design: support for managed memory, direct copy, ROCm GDR
- Random scheduling policy for DC transport
- Optimized out-of-box settings for multi-rail
- Added support for OmniPath (using Verbs)
- Support for PCI atomics with IB transports
- Reduced UCP address size for homogeneous environments
Bugfixes:
- Multiple stability and performance improvements in TCP transport
- Multiple stability fixed in Verbs and MLX5 transports
- Multiple stability fixes in UCM memory hooks
- Multiple stability fixes in UGNI transport
- RPM Spec file cleanup
- Fixing compilation issues with most recent clang and gcc compilers
Tested configurations:
- RDMA: MLNX_OFED 4.5, distribution inbox drivers, rdma-core 22.1
- CUDA: gdrcopy 1.3.2, cuda 9.2, ROCm 2.2
- XPMEM: 2.6.2
- KNEM: 1.1.3
v1.5.1
v1.5.1-RC1
Bugfixes:
- Fix dc_mlx5 transport support check for inbox libmlx5 drivers - issue #3301
- Fix compilation warnings with gcc9 and clang
- ROCm - reduce log level of device-not-found message
v1.5.0
Features:
- New emulation mode enabling full UCX functionality (Atomic, Put, Get)
over TCP and RDMA-CORE interconnects which don't implement full RDMA semantics - Non-blocking API for all one-sided operations. All blocking communication APIs marked
as deprecated - New client/server connection establishment API, which allows connected handover between workers
- Support for rdma-core direct-verbs (DEVX) and DC with mlx5 transports
- GPU - Support for stream API and receive side pipelining
- Malloc hooks using binary instrumentation instead of symbol override
- Statistics for UCT tag API
- GPU-to-Infiniband HCA affinity support based on locality/distance (PCIe)
Bugfixes:
- Fix overflow in RC/DC flush operations
- Update description in SPEC file and README
- Fix RoCE source port for dc_mlx5 flow control
- Improve ucx_info help message
- Fix segfault in UCP, due to int truncation in count_one_bits()
- Multiple other bugfixes (full list on github)
Tested configurations:
- InfiniBand: MLNX_OFED 4.4-4.5, distribution inbox drivers, rdma-core
- CUDA: gdrcopy 1.2, cuda 9.1.85
- XPMEM: 2.6.2
- KNEM: 1.1.2
v1.5.0 RC2
Features:
- New emulation mode enabling full UCX functionality (Atomic, Put, Get)
over TCP and RDMA-CORE interconnects which don't implement full RDMA semantics - Non-blocking API for all one-sided operations. All blocking communication APIs marked
as deprecated - New client/server connection establishment API, which allows connected handover between workers
- Support for rdma-core direct-verbs (DEVX) and DC with mlx5 transports
- GPU - Support for stream API and receive side pipelining
- Malloc hooks using binary instrumentation instead of symbol override
- Statistics for UCT tag API
- GPU-to-Infiniband HCA affinity support based on locality/distance (PCIe)
Bugfixes:
- Fix overflow in RC/DC flush operations
- Update description in SPEC file and README
- Fix RoCE source port for dc_mlx5 flow control
- Improve ucx_info help message
- Fix segfault in UCP, due to int truncation in count_one_bits()
- Multiple other bugfixes (full list on github)
Tested configurations:
- InfiniBand: MLNX_OFED 4.4-4.5, distribution inbox drivers, rdma-core
- CUDA: gdrcopy 1.2, cuda 9.1.85
- XPMEM: 2.6.2
- KNEM: 1.1.2
- Multiple bugfixes (full list on github)
v1.5.0 RC1
Features:
- Statistics for UCT tag API
- New emulation mode enabling full UCX functionality (Atomic, Put, Get)
over TCP and RDMA-CORE interconnects that don't implement full RDMA semantics. - Non-blocking API for all one-sided operations. All blocking communication APIs marked
as deprecated. - New client/server connection establishment API
- Added CUDA support for stream API
Bugfixes:
- Multiple bugfixes (full list on github)
v1.4.0
Features:
- Improved support for installation with latest ROCm
- Improved support for latest rdma-core
- Adding support for CUDA IPC for intra-node GPU
- Added support for CUDA memory allocation cache for mem-type detection
- Added support for latest Mellanox devices
- Added support for Nvidia GPU managed memory
- Added support for multiple connections between the same pair of workers
- Added support large worker address for client/server connection establishment
and INADDR_ANY - Added support for bitwise atomics operations
Bugfixes:
- Performance fixes for rendezvous protocol
- Memory hook fixes
- Clang support fixes
- Self tl multi-rail fix
- Thread safety fixes in IB/RDMA transport
- Compilation fixes with upstream rdma-core
- Multiple minor bugfixes (full list on github)
- Segfault fix for a code generated by armclang compiler
- UCP memory-domain index fix for zero-copy active messages
Tested configurations:
- InfiniBand: MLNX_OFED 4.2-4.4, distribution inbox drivers, rdma-core
- CUDA: gdrcopy 1.2, cuda 9.1.85
- XPMEM: 2.6.2
- KNEM: 1.1.2
- Multiple bugfixes (full list on github)
Known issues:
- #2919 - Segfault in CUDA support when KNEM not present and CMA is active
intra-node RMA transpor. As a workaround user can disable CMA support at
compile time: --disable-cma. Alternatively user can remove CMA from UCX_TLS
list, for example: UCX_TLS=mm,rc,cuda_copy,cuda_ipc,gdr_copy.
v1.4.0 RC2
Features:
- Improved support for installation with latest ROCm
- Improved support for latest rdma-core
- Adding support for CUDA IPC for intra-node GPU
- Added support for CUDA memory allocation cache for mem-type detection
- Added support for latest Mellanox devices
- Added support for Nvidia GPU managed memory
- Added support for multiple connections between the same pair of workers
- Added support large worker address for client/server connection establishment
and INADDR_ANY - Added support for bitwise atomics operations
Bugfixes:
- Performance fixes for rendezvous protocol
- Memory hook fixes
- Clang support fixes
- Self tl multi-rail fix
- Thread safety fixes in IB/RDMA transport
- Compilation fixes with upstream rdma-core
- Multiple minor bugfixes (full list on github)
- Segfault fix for a code generated by armclang compiler
- UCP memory-domain index fix for zero-copy active messages
Tested configurations:
- InfiniBand: MLNX_OFED 4.2-4.4, distribution inbox drivers, rdma-core
- CUDA: gdrcopy 1.2, cuda 9.1.85
- XPMEM: 2.6.2
- KNEM: 1.1.2
- Multiple bugfixes (full list on github)
Known issues:
- #2919 - Segfault in CUDA support when KNEM not present and CMA is active
intra-node RMA transpor. As a workaround user can disable CMA support at
compile time: --disable-cma. Alternatively user can remove CMA from UCX_TLS
list, for example: UCX_TLS=mm,rc,cuda_copy,cuda_ipc,gdr_copy.
v1.4.0 RC1
Features:
- Improved support for installation with latest ROCm
- Improved support for latest rdma-core
- Adding support for CUDA IPC for intra-node GPU
- Added support for CUDA memory allocation cache for mem-type detection
- Added support for latest Mellanox devices
- Added support for Nvidia GPU managed memory
- Added support for multiple connections between the same pair of workers
- Added support large worker address for client/server connection establishment
and INADDR_ANY - Added support for bitwise atomics operations
Bugfixes:
- Performance fixes for rendezvous protocol
- Memory hook fixes
- Clang support fixes
- Self tl multi-rail fix
- Thread safety fixes in IB/RDMA transport
- Compilation fixes with upstream rdma-core
- Multiple minor bugfixes (full list on github)
Tested configurations:
- InfiniBand: MLNX_OFED 4.2-4.4, distribution inbox drivers, rdma-core
- CUDA: gdrcopy 1.2, cuda 9.1.85
- XPMEM: 2.6.2
- KNEM: 1.1.2
- Multiple bugfixes (full list on github)
Known issues:
- #2919 - Segfault in CUDA support when KNEM not present and CMA is active intra-node RMA transpor. As a workaround user can disable CMA support at compile time:
--disable-cma
. Alternatively user can remove CMA from UCX_TLS list, for example:UCX_TLS=mm,rc,cuda_copy,cuda_ipc,gdr_copy
.