arunkarthik-akkart
released this
11 Dec 17:58
·
32 commits
to master
since this release
v1.13.2-aws (2024-12-06)
This release is intended only for use on AWS P* instances. A general release that supports other libfabric networks may be made in the near future.
With this release, building with platform-aws requires 1.22.0amzn4.0 or greater. AWS customers are generally recommended to track the latest-available EFA Installer for performance improvements and bug fixes.
The 1.13.x release series supports NCCL 2.23.4-1 while maintaining backward compatibility with older NCCL versions (NCCL v2.17.1 and later).
Bug Fixes:
- Tuner Improvements:
- Fixed algorithm selection for larger ranks and message sizes.
- Re-calibrated the tuner for AllGather and ReduceScatter regions for 0x7 bitmask on P5en, optimizing performance for larger messages.
- Added tuner support for AllGather and ReduceScatter regions for 0x0 bitmask on P5en.
- Resolved a performance issue by preventing the eager protocol when RDMA writes are in flight, improving small AllReduce collective performance.
Note: dmabuf support is now turned off by default. Users can enable it explicitly using OFI_NCCL_DISABLE_DMABUF=0 if needed.
Checksum (sha512) for the release tarball:
4c0ac3144f178062fda9e86b50bb1784822e8fdbdffadf41cdbb30839456c4e912254ff12a5b0a8c63abbe910597fd14211a42572a451d10e01932100013971e aws-ofi-nccl-1.13.2-aws.tar.gz