Skip to content

v1.6.0 release

Compare
Choose a tag to compare
@Anerudhan Anerudhan released this 12 Aug 23:17
· 5 commits to main since this release
23511ba

Release notes:

New API

  • Graph Slice Operation: Introduced the graph.slice operation for slicing input tensors. Refer to docs/operations/Slice.md for detailed documentation and samples/cpp/misc/slice.cpp for a C++ sample. Pybinds for this operation have also been added.
  • SM Carveout Feature: Added the set_sm_count(int32_t type) graph property to support the SM Carveout feature introduced in Ampere and Hopper GPUs. Engines that do not support SM_COUNT will return NOT_SUPPORTED.

Bug Fixes

  • Convolution Mode Attribute: Added the missing set_convolution_mode attribute to convolution attributes in forward propagation (fprop), data gradient (dgrad), and weight gradient (wgrad). Previously, this was hardcoded to CUDNN_CROSS_CORRELATION in the 1.x API.
  • SDPA FP8 Backward Node: Fixed an issue with the deserialization of the sdpa_fp8_backward node.

Enhancements

  • Graph Execution Overhead: Reduced the overhead of graph.execute() by optimizing sub-node tree traversal, collected UIDs, workspace modifications, and workspace size.
  • Graph Validation Performance: Significantly improved (~10x) the performance of graph.validate() by deferring graph expansion to a later stage (build_operation_graph).
  • Optional Running Stats for BatchNorm: Made the running statistics for the batch normalization operation optional, supported by cuDNN backend version 9.3.0 and later.
  • Shape and Stride Inferencing: Enhanced shape and stride inferencing to preserve the stride order of the input.
  • Diagnostic Error Message: Added a diagnostic error message to create_execution_plans if called without the preceding build_operation_graph.
  • JSON Schema and Deserialization: Improved the JSON schema and deserialization logic with additional checks.
  • Logging Overhead: Reduced logging overhead, resulting in faster graph.build() calls.
  • CMake Integration: Replaced CMAKE_SOURCE_DIR with PROJECT_SOURCE_DIR in CMake files for better integration. See the relevant pull request for more details.

Samples

  • Jupyter Notebooks: Added Jupyter notebooks for RMSNorm, InstanceNorm, and LayerNorm. Refer to the samples/python folder for more information.