Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
## New API - Kernel Cache support for dynamic graphs Added New APIs to enable kernel cache support for graphs with dynamic shapes. Please refer to [documentation](docs/dynamic_kernel_cache.md) for API details. Added examples `Convolution fprop dynamic shape`, `CSBR Graph dynamic shape`, `Matmul dynamic shape` and `Bias + Matmul dynamic shape` to showcase use of dynamic shapes and kernel cache. - Two new APIs to describe the plan in the form engine number and knobs are introduced. ``` error_t get_plan_name(std::string &name) const; error_t get_plan_name_at_index(int64_t plan_index, std::string &name) const; ``` Note: This name can be used later if you want to deselect_plan_by_name, if run into any potential errors. - Added an API to query tensor attributes from its UID in a graph. `query_tensor_with_uid(int64_t const uid, Tensor_attributes &tensor) const;` ## Improvements - sdpa fp16 bprop node can now compute dbias when padding mask is enabled. - sdpa fp8 (forward and bprop) nodes now support optional bias, dropout and padding mask. - Matmul fp8 node can now accept M,N,K overrides. - Added new python notebooks for implementing BatchNorm and BatchNorm bprop using cuDNN. - Updated [benchmark numbers](benchmark) with cudnn 9.4.0 for fp16 and fp8 datatypes. - Fixed compilation issues when `NV_CUDNN_DISABLE_EXCEPTION` is enabled. ## Bug fixes - Fixed a crash when the output dimension of dgrad node is not specified. This now returns an error message instead. - Fixed incorrect SDPA stats stride inferencing. - Fixed a bug in sdpa test when sliding window attention is enabled and query sequence length (s_q) is greater than key length (s_kv). This case is now not supported.
- Loading branch information