Skip to content

Releases: ROCm/rccl

RCCL 2.21.5 for ROCm 6.3.1

20 Dec 16:12
4ab67f5
Compare
Choose a tag to compare

Added

Changed

  • Enhanced user documentation

Resolved issues

  • Corrected user help strings in install.sh

RCCL 2.21.5 for ROCm 6.3.0

03 Dec 19:49
eef7b29
Compare
Choose a tag to compare

Added

  • MSCCL++ integration for specific contexts
  • Performance collection to rccl_replayer
  • Tuner Plugin example for MI300
  • Tuning table for large number of nodes
  • Support for amdclang++
  • New Rome model

Changed

  • Compatibility with NCCL 2.21.5
  • Increased channel count for MI300X multi-node
  • Enabled MSCCL for single-process multi-threaded contexts
  • Enabled gfx12
  • Enabled CPX mode for MI300X
  • Enabled tracing with rocprof
  • Improved version reporting
  • Enabled GDRDMA for Linux kernel 6.4.0+

Resolved issues

  • Fixed model matching with PXN enable

Known issues

  • MSCCL is temporarily disabled for AllGather collectives.
    • This can impact in-place messages (< 2 MB) with ~2x latency.
    • Older RCCL versions are not impacted.
    • This issue will be addressed in a future ROCm release.
  • Unit tests do not exit gracefully when running on a single GPU.
    • This issue will be addressed in a future ROCm release.

rccl 2.20.5 for ROCm 6.2.4

06 Nov 19:55
612add2
Compare
Choose a tag to compare

RCCL code for ROCm 6.2.4 did not change. The library was rebuilt for the updated ROCm 6.2.4 stack.

rccl 2.20.5 for ROCm 6.2.2

27 Sep 16:01
d380693
Compare
Choose a tag to compare

RCCL code for ROCm 6.2.2 did not change. The library was rebuilt for the updated ROCm 6.2.2 stack.

rccl 2.20.5 for ROCm 6.2.1

20 Sep 19:58
d380693
Compare
Choose a tag to compare

RCCL code for ROCm 6.2.1 did not change. The library was rebuilt for the updated ROCm 6.2.1 stack.

RCCL 2.20.5 for ROCm 6.2.0

02 Aug 16:15
45b618a
Compare
Choose a tag to compare

Changed

  • Compatibility with NCCL 2.20.5
  • Compatibility with NCCL 2.19.4
  • Performance tuning for some collective operations on MI300
  • Enabled NVTX code in RCCL
  • Replaced rccl_bfloat16 with hip_bfloat16
  • NPKit updates:
    • Removed warm-up iteration removal by default, need to opt in now
    • Doubled the size of buffers to accommodate for more channels
  • Modified rings to be rail-optimized topology friendly
  • Replaced ROCmSoftwarePlatform links with ROCm links

Added

  • Support for fp8 and rccl_bfloat8
  • Support for using HIP contiguous memory
  • Implemented ROC-TX for host-side profiling
  • Enabled static build
  • Added new rome model
  • Added fp16 and fp8 cases to unit tests
  • New unit test for main kernel stack size
  • New -n option for topo_expl to override # of nodes
  • Improved debug messages of memory allocations
  • Channel shuffling for IB systems

Fixed

  • Bug when configuring RCCL for only LL128 protocol
  • Scratch memory allocation after API change for MSCCL
  • Incorrect minNchannels in multi-node

RCCL 2.18.6 for ROCm 6.1.2

04 Jun 16:53
2fbe387
Compare
Choose a tag to compare

Changed

  • Reduced NCCL_TOPO_MAX_NODES to limit stack usage and avoid overflow

rccl 2.18.6 for ROCm 6.1.1

08 May 18:00
164c955
Compare
Choose a tag to compare

RCCL code for ROCm 6.1.1 did not change. The library was rebuilt for the updated ROCm 6.1.1 stack.

RCCL 2.18.6 for ROCm 6.1.0

16 Apr 19:09
164c955
Compare
Choose a tag to compare

Changed

  • Compatibility with NCCL 2.18.6

rccl 2.18.3 for ROCm 6.0.2

31 Jan 20:12
2f6d59e
Compare
Choose a tag to compare

RCCL code for ROCm 6.0.2 did not change. The library was rebuilt for the updated ROCm 6.0.2 stack.