Skip to content

Releases: openucx/ucx

v1.12.0-rc3

11 Jan 15:47
d74fd54
Compare
Choose a tag to compare
v1.12.0-rc3 Pre-release
Pre-release

1.12.0 RC3 (January 11, 2022)

Bugfixes

  • Fixes in tag_send datatype processing
  • Fixed keep-alive protocol for intra-node transports (sm, cuda)

v1.12.0-rc2

08 Jan 20:30
9fe66a5
Compare
Choose a tag to compare
v1.12.0-rc2 Pre-release
Pre-release

1.12.0 RC2 (January 8, 2022)

Features:

Added detection of IB NDR

v1.12.0-rc1

14 Dec 16:07
b98911f
Compare
Choose a tag to compare
v1.12.0-rc1 Pre-release
Pre-release

1.12.0 RC1 (December 14, 2021)

Features:

Core

  • Added beta-level support for Go language bindings
  • Added new objects to VFS (md, component, log_level, etc.)
  • Added configuration variable to specify which loadable modules are allowed
  • Added build-time configuration to disable sigaction overriding

UCP

  • Added client_id to ucp_worker_create() and ucp_conn_request_query() APIs
  • Added ucp_worker_address_query() API
  • Updated ucp_ep_query() API for getting local and remote addresses
  • Added address versioning to correctly preserve wire compatibility starting from version 1.11.0
  • Added new client/server connection establishment packet header format
  • Enabled rendezvous and tag sync protocols when error handling is enabled on the endpoint
  • Added iov zcopy support to RMA operations
  • Reduced memory usage of unexpected messages by fitting receive buffer size to packet size
  • Added support for modifying UCT and UCS configs by ucp_config_modify() API
  • Optimized unpacked rkeys memory consumption
  • Added request flag to influence latency vs. bandwidth protocol
  • Reduced memory management overhead with new protocols
  • Improved performance calculations for new protocols
  • Added AMO support with GPU memory target using new protocols
  • Added put_zcopy, get_zcopy and pipeline based rendezvous in new protocols
  • Added support for user-defined alignment in Active Messages
  • Added support for offload tag sync in new protocols
  • Updated ucp_atomic_post() to use NBX flow

UCT

  • Added API - uct_iface_is_reachable_v2()
  • Added IPv6 address support in TCP
  • Added latency estimation to uct_iface_estimate_perf()
  • Adjusted knem and cma overhead cost
  • Increased built-in TCP keep-alive interval to 2 seconds

RDMA CORE (IB, ROCE, etc.)

  • Added check for CQ overrun in assert mode
  • Added bitmap usage for releasing detached DCIs
  • Added configuration for requests ack frequency with DevX
  • Added remote QP info to tx error CQE traces

UCS

  • Added API for a per-process aggregate-sum statistics report
  • Added memory pool set data structure
  • Added new ptr_array API for bulk allocation
  • Added ucs_string_buffer_append_flags() for string buffer
  • Added ucs_ffs32()
  • Added ucs_vsnprintf_safe() which always adds '\0'
  • Added thread-safe put to ptr_map
  • Improved accuracy of the topology distance estimation
  • Added prints of leaked callbacks from the callback queue
  • Removed a diagnostic message when fuse thread is stopped
  • Added configurable limit for the memory consumed by rcache
  • Added configuration for VFS(FUSE) thread affinity
  • Added memory limit support to memtrack

CUDA

  • Added global memtype cache to allow UCT transports to query memory attributes
  • Auto-register CUDA whole allocations to avoid repeated registration costs
  • Added capability to select CUDA stream based on source and destination memory type
    (required for device memory based pipelining)
  • Added selection of CUDA-IPC capabilities based on NVLINK topology
    (to prefer writes vs. reads for specific platforms using NVML)
  • Added option to set cuda_copy bandwidth
  • Added profiling of CUDA runtime function calls
  • Added option to limit GPUDirectRDMA size in rendezvous protocol

Java

  • Added ucp_listener_reject functionality
  • Added support for setting worker id and querying it from the connection request
  • Added support to bind on a free port in UcpListener

Packaging

  • Added cmake config files for better integration with external cmake based projects

Tests

  • Removed memcpy from AM eager flow in io_demo
  • Added check_qps.sh script to detected stuck QPs
  • Improved diagnostic in test_init_mt
  • Added iov support in ucp_client_server
  • Added option to use epoll in io_demo
  • Added registration of memory allocated by io_demo in memtrack
  • Extended statistics in io_demo
  • Improved logging in io_demo
  • Replaced rand by urand in io_demo
  • More improvements in io_demo
  • Generalized median calculation to support any percentile in ucx_perftest

Tools

  • Added loop-back transport support in ucx_perftest
  • Split ucx_perftest into separate modules
  • Added process placement option for ucx_info
  • Extended parameters correctness check in ucx_perftest
  • Added support for GPU memory RMA and atomics in ucx_perftest

CI

  • Updated gtest 1.7 to 1.10
  • Increased uptime in network corrupter (used for io_demo)
  • Enabled set of gtests for new protocols
  • Added running CI in docker containers
  • Increased thresholds for test_ucp_wait_mem
  • Added test for ucx binary compatibility between OS versions
  • Increased test job timeout to 6 hours
  • Reduced testing time under valgrind
  • Added suppressions for glibc and libnl leaks
  • Relaxed performance requirements in perf test

Bugfixes

Core

  • Fixed invalid remote memory access after connection error
  • Fixed creating more than 64K endpoints between the same peers
  • Fixed simultaneous endpoint close with ucp_hello_world

UCP

  • Fixes and improvements in new protocols infrastructure
  • Fixes in AM flows
  • Fixed tag short threshold selection
  • Multiple fixes in keep-alive protocol
  • Multiple fixes in wire-up protocol
  • Fixes in error flow during rendezvous protocol
  • Multiple fixes in general error flow
  • Fixed fallback to PUT pipeline in rendezvous protocol
  • Reduced default value of keep-alive interval to 20 seconds

UCT

  • Fixed deadlock in TCP
  • Suppressed EHOSTUNREACH error in TCP sockcm
  • Restricted connecting loop-back to other devices in TCP

RDMA CORE (IB, ROCE, etc.)

  • Fixed pkey_index initialization when creating RC QP with DEVX
  • Disabled MP_SRQ by default
  • Fixed TX WQ overflow check
  • Fixed dci->pool_index initialization when HAVE_DC_DV is false
  • Fixed syndrome value for creating rdmacm reserved qpn
  • Fixed error code on rdma_establish failure
  • Fixed uct_ep_am_short_iov for UD verbs
  • Fixed handling of error CQE after rc_ep is destroyed
  • Fixes in flow control when error CQE is polled
  • Multiple fixes in RC and DC error flows
  • Fixed deadlock between DCIs and RDMA_READ credits
  • Removed AM handler invocation for PURE_GRANT messages
  • Fixed endpoint arbiter_group leak in DC
  • Fixed resource check in flush for DC

UCS

  • Fixed segmentation fault for ucs_stats_parser
  • Fixed potential crash on cleanup when use UCX profiling
  • Fixed read_profile print of new request
  • Fixed uninitialized variable access in VFS
  • Changed log level of inotify_init failure to diag
  • Fixed integer overflow in mpool chunk allocation

Packaging

  • Fixed with-fuse arg for RPM build

Documentation

  • Fixes in UCP, UCT, UCS, FAQ and README documentation

Tests

  • Multiple fixes in io_demo

CI

  • Fixed snapshot docker name
  • Fixed hipMallocManaged hook gtest
  • Fixes in Azure release pipeline
  • Fixes in Coverity CI
  • Fixed test_uct_query gtest for ROCm
  • Fixes in jenkins test script
  • Fixed release commit title check

v1.11.2

30 Sep 20:35
ef2bbcf
Compare
Choose a tag to compare

1.11.2 (September 30, 2021)

Bugfixes

  • Fixes in Java release pipeline
  • Fixes in handling large number of devices
  • Fixes in UD out-of-order processing
  • Fixes in switching transports during client/server connection setup
  • Fixes in transport-level error reporting

v1.11.2-rc1

23 Sep 20:52
1ae6d0c
Compare
Choose a tag to compare
v1.11.2-rc1 Pre-release
Pre-release

Bugfixes

  • Fixes in Java release pipeline
  • Fixes in handling large number of devices
  • Fixes in UD out-of-order processing
  • Fixes in switching transports during client/server connection setup
  • Fixes in transport-level error reporting

v1.11.1

31 Aug 14:45
c58db6b
Compare
Choose a tag to compare

Features:

UCS

  • Added API to read boot ID value or use machine_guid

Bugfixes:

  • Fixes in Cuda memory hooks
  • Fixes in setting traffic class for DCT RoCE transport
  • Fixes in TCP endpoint flush
  • Fixes in TCP pending operations progress
  • Fixes in release pipelines
  • Fixes in error handling flow
  • Fixes in multi-threaded tag probe
  • Fixes in TCP disconnect flow
  • Fixes in RPM post-install script
  • Fixes in UCT common keepalive

v1.11.1-rc3

27 Aug 10:24
6f809ad
Compare
Choose a tag to compare
v1.11.1-rc3 Pre-release
Pre-release

1.11.1-rc3 (August 26, 2021)

Bugfixes:

  • Fixes in RPM post-install script
  • Fixes in UCT common keepalive

v1.11.1-rc2

19 Aug 14:40
5d8c109
Compare
Choose a tag to compare
v1.11.1-rc2 Pre-release
Pre-release

1.11.1-rc2 (August 19, 2021)

Bugfixes:

  • Fixes in multi-threaded tag probe #7231
  • Fixes in TCP disconnect flow #7251

v1.11.1-rc1

10 Aug 22:07
06bdf26
Compare
Choose a tag to compare
v1.11.1-rc1 Pre-release
Pre-release

1.11.1-rc1 (August 10, 2021)

Features:

UCS

  • Added API to read boot ID value or use machine_guid

Bugfixes:

  • Fixes in Cuda memory hooks
  • Fixes in setting traffic class for DCT RoCE transport
  • Fixes in TCP endpoint flush
  • Fixes in TCP pending operations progress
  • Fixes in release pipelines
  • Fixes in error handling flow

v1.11.0

26 Jul 21:35
fa84605
Compare
Choose a tag to compare

1.11.0 (July 26, 2021)

Features:

Core

  • Added support for UCX monitoring using virtual file system (VFS)/FUSE
  • Added support for applications with static CUDA runtime linking
  • Added support for a configuration file
  • Updated clang format configuration

UCP

  • Added rendezvous API for active messages
  • Added user-defined name to context, worker, and endpoint objects
  • Added flag to silence request leak check
  • Added API for endpoint performance evaluation
  • Added API - ucp_request_query
  • Added API - ucp_lib_query
  • Ported connection manager to a new UCT API
  • Added bandwidth optimizations for new protocols multi-lane
  • Added support for multi-rail over lanes with BW ratio >= 1/4
  • Added support for tracking outstanding requests and aborting those in case of connection failure
  • Refactored keep-alive protocol
  • Added device id to wireup protocol
  • Added support up to 128 transport layer resources in UCP context
  • Added support CUDA memory allocations with ucp_mem_map
  • Increased UCP_WORKER_MAX_EP_CONFIG to 64
  • Adjusted memory type zcopy threshold when UCX_ZCOPY_THRESH set
  • Refactored wireup protocols, rendezvous, get, zcopy protocols
  • Added put zcopy multi-rail
  • Improved logging for new protocols
  • Added system topology information
  • Added new protocols for eager offload protocols

UCT

  • Extended connection establishment API
  • Added active message AM alignment in iface params
  • Added active message short IOV API.
  • Added support for interface query by operation and memory type
  • Added API to get allocation base address and length
  • Added md_dereg_v2 API

UCS

  • Added log filter by source file name.
  • Added checking for last element in fraglist queue
  • Added a method to get IP address from sockaddr.
  • Added memory usage limits to registration cache

UCM

  • Improved x86 parser to recognize some mov flavors

CUDA

  • Added registration for whole CUDA allocations
  • Added CUDA-IPC keepalive
  • Adjusted performance estimations
  • Added Improve logging
  • Added allocation methods for CUDA pinned/managed memory
  • Added support for a global cuda_ipc cache

RDMA CORE (IB, ROCE, etc.)

  • Added report of QP info in case of completion with error
  • Refactored of FC send operations
  • Added support for DevX unique QPN allocation
  • Optimized endpoint lookup for DCI
  • Added support for RDMA sub-function (SF)
  • Added support for DCI via DEVX
  • Added DCI pool per LAG port
  • Added support for RoCE IP reachability check using a subnet mask
  • Added active message short IOV for UD/DC/RC mlx, UD/RC verbs
  • Added endpoint keep alive check for UD
  • Suppressed warning if device can't be opened
  • Added support for multiple flush cancel without completion
  • Added ignore for devices with invalid GID
  • Added support for SRQ linked list reordering
  • Added flush by flow control on old devices
  • Added support for configurable rdma_resolve_addr/route timeout

Shared memory

  • Added active message short IOV support for posix, sysv, and self transports

TCP

  • Added support for peer failure in case of CONNECT_TO_EP
  • Added support for active message short IOV

Java

  • Added full support for UCP Java API

Tests

  • Added length/mem_type for UCP client server example
  • Added port sockaddr tests for a new API
  • Added test send-recv between client/server with diff UCX_IB_NUM_PATHS
  • Added support for CUDA and CUDA managed memory in io_demoo
  • Added support for a custom watchdog timeout from command line
  • Extended memtype hook tests

Tools

  • Added UCP active message support to perftest
  • Added error handling option to perftest
  • Added wakeup option
  • Added performance tests for am short iov

CI

  • Added RHEL 7.6 with MOFED 4.7
  • Added Fedora 34, RHEL 7.2, 7.4
  • Added PGI support from HPC-SDK module
  • Added docker image with CUDA 11.2
  • Added IODEMO test
  • Added Ubuntu 20.4
  • Added test for connection manager fallback in client-server testing
  • Added loopback interface for tcp testing

Bugfixes:

Build

  • Fixes in libnuma detection macro
  • Fixes for cross compilation support
  • Fixes for --without-dc compilation

Continues Integration

  • Fixes in Azure pipeline build system
  • Fixes in Coverity CI
  • Fixes in Azure release pipeline

Packaging

  • Fixed in DEB package - added essential system dependencies

Documentation

  • Fixes in UCP, UCT, Readme, FAQ, and Read-the-docs documentation

Tests

  • Fixes in CMA peer failure test
  • Fixes in SRQ tests
  • Fixes in the usage requests_wait
  • Fixes in test_uct_query
  • Fixes addressing race conditions on client user data in test_uct_sockaddr
  • Fixes in IODEMO app
  • Fixes in error handling flow for perftest
  • Fixes in perftest batch tests
  • Fixes addressing hang issues for rendezvous protocol in UCP client server example

UCP

  • Fixes in endpoint error handling
  • Fixes in error reporting failed CM lanes
  • Fixes in progress worker flush
  • Fixes in rendezvous pipeline flow
  • Fixes in recursive protocol selection
  • Fixes in error handling for AM_ZCOPY
  • Fixes in length check condition in RMA PUT short
  • Fixes in failure handling rendezvous offload send
  • Fixes in offload completion with inlined data
  • Fixes in statistics calculations for rendezvous protocol
  • Fixes in ucp_worker_query() thread mode for SERIALIZED
  • Fixes preventing leaks of UCP requests

ROCM

  • Fixes in device memory registration and de-registration
  • Fixes in missing mem_query definition for rocm_copy
  • Fixes addressing build failure due to const violation
  • Fixes in sockaddr_accessibility test for rocm_copy and rocm_ipc
  • Fixes in bandwidth estimation for rocm_ipc

RDMA CORE (IB, ROCE, etc.)

  • Fixes addressing deadlock between DCI resources and RDMA_READ credits
  • Fixes in DSCP for RoCE DCT
  • Fixes in flush(cancel) flow
  • Fixes preventing segfault in uct_rdmacm_cm_ep_str
  • Fixes in scatter-gather entries logging
  • Fixes for compilation with experimental verbs
  • Fixes in UD dgid filtering
  • Fixes in domain resources destroying
  • Fixes in PCIe bandwidth calculation
  • Fixes addressing CQ creation failure using legacy ibv API
  • Fixes in iov2sge converter
  • Fixes in port width check on HDR100
  • Fixes in SL selection
  • Fixes in hardware tag matching compilation
  • Fixes in uct_rdmacm_cm_cqs hash key
  • Fixes for compilation with rdma-core 20

Java

  • Fixes in tag sender mask

UCT

  • Fixes in reachability of loopback ifaces
  • Fixes addressing possible uninitialized memory accesses
  • Fixes in error flow for endpoints created upon receiving connection request
  • Fixes in TCP keepalive to avoid false-positive error detection

UCM

  • Fixes addressing heap corruption caused by ucp_set_event_handler()
  • Fixes in mmap events test