Releases: ROCm/rccl
Releases · ROCm/rccl
RCCL 2.21.5 for ROCm 6.3.1
Added
Changed
- Enhanced user documentation
Resolved issues
- Corrected user help strings in
install.sh
RCCL 2.21.5 for ROCm 6.3.0
Added
- MSCCL++ integration for specific contexts
- Performance collection to rccl_replayer
- Tuner Plugin example for MI300
- Tuning table for large number of nodes
- Support for amdclang++
- New Rome model
Changed
- Compatibility with NCCL 2.21.5
- Increased channel count for MI300X multi-node
- Enabled MSCCL for single-process multi-threaded contexts
- Enabled gfx12
- Enabled CPX mode for MI300X
- Enabled tracing with rocprof
- Improved version reporting
- Enabled GDRDMA for Linux kernel 6.4.0+
Resolved issues
- Fixed model matching with PXN enable
Known issues
- MSCCL is temporarily disabled for AllGather collectives.
- This can impact in-place messages (< 2 MB) with ~2x latency.
- Older RCCL versions are not impacted.
- This issue will be addressed in a future ROCm release.
- Unit tests do not exit gracefully when running on a single GPU.
- This issue will be addressed in a future ROCm release.
rccl 2.20.5 for ROCm 6.2.4
RCCL code for ROCm 6.2.4 did not change. The library was rebuilt for the updated ROCm 6.2.4 stack.
rccl 2.20.5 for ROCm 6.2.2
RCCL code for ROCm 6.2.2 did not change. The library was rebuilt for the updated ROCm 6.2.2 stack.
rccl 2.20.5 for ROCm 6.2.1
RCCL code for ROCm 6.2.1 did not change. The library was rebuilt for the updated ROCm 6.2.1 stack.
RCCL 2.20.5 for ROCm 6.2.0
Changed
- Compatibility with NCCL 2.20.5
- Compatibility with NCCL 2.19.4
- Performance tuning for some collective operations on MI300
- Enabled NVTX code in RCCL
- Replaced rccl_bfloat16 with hip_bfloat16
- NPKit updates:
- Removed warm-up iteration removal by default, need to opt in now
- Doubled the size of buffers to accommodate for more channels
- Modified rings to be rail-optimized topology friendly
- Replaced ROCmSoftwarePlatform links with ROCm links
Added
- Support for fp8 and rccl_bfloat8
- Support for using HIP contiguous memory
- Implemented ROC-TX for host-side profiling
- Enabled static build
- Added new rome model
- Added fp16 and fp8 cases to unit tests
- New unit test for main kernel stack size
- New -n option for topo_expl to override # of nodes
- Improved debug messages of memory allocations
- Channel shuffling for IB systems
Fixed
- Bug when configuring RCCL for only LL128 protocol
- Scratch memory allocation after API change for MSCCL
- Incorrect minNchannels in multi-node
RCCL 2.18.6 for ROCm 6.1.2
Changed
- Reduced NCCL_TOPO_MAX_NODES to limit stack usage and avoid overflow
rccl 2.18.6 for ROCm 6.1.1
RCCL code for ROCm 6.1.1 did not change. The library was rebuilt for the updated ROCm 6.1.1 stack.
RCCL 2.18.6 for ROCm 6.1.0
Changed
- Compatibility with NCCL 2.18.6
rccl 2.18.3 for ROCm 6.0.2
RCCL code for ROCm 6.0.2 did not change. The library was rebuilt for the updated ROCm 6.0.2 stack.