Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

1.7.0-rc #111

Merged
merged 1 commit into from
Sep 23, 2024
Merged

1.7.0-rc #111

merged 1 commit into from
Sep 23, 2024

Conversation

Anerudhan
Copy link
Collaborator

@Anerudhan Anerudhan commented Sep 19, 2024

cudnn FE 1.7.0 Release notes:

New API

  • Kernel Cache support for dynamic graphs Added New APIs to enable kernel cache support for graphs with dynamic shapes. Please refer to documentation for API details.

Added examples Convolution fprop dynamic shape, CSBR Graph dynamic shape, Matmul dynamic shape and Bias + Matmul dynamic shape to showcase use of dynamic shapes and kernel cache.

  • Two new APIs to describe the plan in the form engine number and knobs are introduced.
error_t
get_plan_name(std::string &name) const;

error_t
get_plan_name_at_index(int64_t plan_index, std::string &name) const;

Note:
This name can be used later if you want to deselect_plan_by_name, if run into any potential errors.

  • Added an API to query tensor attributes from its UID in a graph. query_tensor_with_uid(int64_t const uid, Tensor_attributes &tensor) const;

Improvements

  • sdpa fp16 bprop node can now compute dbias when padding mask is enabled (requires cudnn 9.4.0 and above).

  • sdpa fp8 (forward and bprop) nodes now support optional bias, dropout and padding mask(requires cudnn 9.4.0 and above).

  • Matmul fp8 node can now accept M,N,K overrides.

  • Added new python notebooks for implementing BatchNorm and BatchNorm bprop using cuDNN.

  • Updated benchmark numbers with cudnn 9.4.0 for fp16 and fp8 datatypes.

  • Fixed compilation issues when NV_CUDNN_DISABLE_EXCEPTION is enabled.

Bug fixes

  • Fixed a crash when the output dimension of dgrad node is not specified. This now returns an error message instead.

  • Fixed incorrect SDPA stats stride inferencing.

  • Fixed a bug in sdpa test when sliding window attention is enabled and query sequence length (s_q) is greater than key length (s_kv). This case is now not supported.

@Anerudhan Anerudhan force-pushed the 1.7.0-rc branch 2 times, most recently from faa82bf to ece77ca Compare September 19, 2024 21:22
## New API

- Kernel Cache support for dynamic graphs
Added New APIs to enable kernel cache support for graphs with dynamic shapes. Please refer to [documentation](docs/dynamic_kernel_cache.md) for API details.

Added examples `Convolution fprop dynamic shape`, `CSBR Graph dynamic shape`, `Matmul dynamic shape` and `Bias + Matmul dynamic shape` to showcase use of dynamic shapes and kernel cache.

- Two new APIs to describe the plan in the form engine number and knobs are introduced.
```
error_t
get_plan_name(std::string &name) const;

error_t
get_plan_name_at_index(int64_t plan_index, std::string &name) const;
```
Note:
This name can be used later if you want to deselect_plan_by_name, if run into any potential errors.

- Added an API to query tensor attributes from its UID in a graph.
`query_tensor_with_uid(int64_t const uid, Tensor_attributes &tensor) const;`

## Improvements

- sdpa fp16 bprop node can now compute dbias when padding mask is enabled.

- sdpa fp8 (forward and bprop) nodes now support optional bias, dropout and padding mask.

- Matmul fp8 node can now accept M,N,K overrides.

- Added new python notebooks for implementing BatchNorm and BatchNorm bprop using cuDNN.

- Updated [benchmark numbers](benchmark) with cudnn 9.4.0 for fp16 and fp8 datatypes.

- Fixed compilation issues when `NV_CUDNN_DISABLE_EXCEPTION` is enabled.

## Bug fixes

- Fixed a crash when the output dimension of dgrad node is not specified. This now returns an error message instead.

- Fixed incorrect SDPA stats stride inferencing.

- Fixed a bug in sdpa test when sliding window attention is enabled and query sequence length (s_q) is greater than key length (s_kv). This case is now not supported.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant