cudnn frontend v1.3 release notes. (#72)

[New API] Added new operations `sdpa_fp8_forward` and `sdpa_fp8_backward` to perform scaled dot prodcut attention of fp8 tensors. See more details in the `docs/operations/Attention.md` and cpp sample in `samples/cpp/mha.cpp`. Pybinds for the fp8 nodes are also added. [New API] Added new operation for resample forward operation. Add a new sample `samples/cpp/resample.cpp` to show its usage. [New API] Add a new API `deselect_engines(std::vector<std::string> const &engine_names)` which blocks certain engine configs from running. [New API] Add new APIs `select_numeric_notes` and `select_behavior_notes` to allow user select engine configs which have the selected numeric and behavior notes respectively. [Python API] Added a custom exception `cudnnGraphNotSupportedException` to the python API to distinguish between graphs that are actually not supported as compared to programming errors. [Python API] Added a new `backend_version_string` which returns the backend version in canonical form (eg. 9.1.0) instead of a version number. [Bug Fix] Updated the workspace computation for sdpa fprop node. Previously, workspace was calculated for alibi slopes irrespective of whether alibi mask was turned on or not. [Bug Fix] Fixed deserialization of pass by values of half precision.
NVIDIA · Apr 10, 2024 · 1b0b5ea · 1b0b5ea
1 parent af9bc9e
commit 1b0b5ea
Show file tree

Hide file tree

Showing 50 changed files with 3,466 additions and 599 deletions.
diff --git a/CMakeLists.txt b/CMakeLists.txt
@@ -1,6 +1,6 @@
 cmake_minimum_required(VERSION 3.17)
 
-project(cudnn_frontend VERSION 1.2.1)
+project(cudnn_frontend VERSION 1.3.0)
 
 option(CUDNN_FRONTEND_SKIP_NLOHMANN_JSON "Defines whether FE should not include nlohmann/json.hpp." OFF)
 option(CUDNN_FRONTEND_BUILD_SAMPLES "Defines if samples are built or not." ON)
@@ -39,6 +39,10 @@ target_link_libraries(
 
 target_compile_features(cudnn_frontend INTERFACE cxx_std_17)
 
+# Make PCH for targets to link against
+add_library(_cudnn_frontend_pch INTERFACE)
+target_precompile_headers(_cudnn_frontend_pch INTERFACE ${PROJECT_SOURCE_DIR}/include/cudnn_frontend.h)
+
 if (CUDNN_FRONTEND_BUILD_SAMPLES)
     add_subdirectory(samples)
 endif()

diff --git a/README.FE.1.0.md b/README.FE.1.0.md
@@ -29,21 +29,24 @@ The steps involved in building and running a cudnn graph are as follows:
 ## APIs
 FE v1.0 API follows a functional style of building a graph. Operations take in input tensors and return output tensors. This also allows composition of operations. 
 
-| Purpose                 | C++ API                                                   | Python API   |
-| ---                     | ---                                                       | ---          |
-| Create tensor           | tensor                                                    | tensor       |
-| [Convolution Fprop](docs/operations/Convolutions.md)       | conv_fprop <br>Conv_fprop_attributes                      | conv_fprop   |
-| [Convolution Dgrad](docs/operations/Convolutions.md)       | conv_dgrad <br>Conv_dgrad_attributes                      | conv_dgrad   |
-| [Convolution Wgrad](docs/operations/Convolutions.md)       | conv_wgrad <br>Conv_wgrad_attributes                      | conv_wgrad   |
-| [Matrix Multiplication](docs/operations/Matmul.md)   | matmul <br> Matmul_attributes                             | matmul       |
-| [Pointwise Operations](docs/operations/Pointwise.md)    | pointwise <br> Pointwise_attributes                       | - add<br>- bias<br>- rqsrt<br>- sub<br>- mul<br>- scale<br>- relu<br>- elu<br>- gelu<br>- cmp_gt       |
-| [Batch Normalization](docs/operations/Normalizations.md)     | batchnorm <br>Batchnorm_attributes                        | batchnorm    |
-| [Batch Norm bprop](docs/operations/Normalizations.md)        | batchnorm_backward <br>Batchnorm_backward_attributes      | batchnorm_backward    |
-| Generate stats of output| genstats <br>Genstats_attributes                          | genstats     |
-| BN Finalize of stats    | bn_finalize <br>BN_finalize_attributes                    | bn_finalize  |
-| Dbn weight              | dbn_weight <br>DBN_weight_attributes                      | dbn_weight   |
-| [Scale dot product attention](docs/operations/Attention.md) | sdpa<br> SDPA_attributes | sdpa |
-| [Scale dot product attention backward](docs/operations/Attention.md) | sdpa_backward<br> SDPA_backward_attributes | sdpa_backward |
+| Purpose                                                                  | C++ API                                              | Python API                                                                                       |
+|--------------------------------------------------------------------------|------------------------------------------------------|--------------------------------------------------------------------------------------------------|
+| Create tensor                                                            | tensor                                               | tensor                                                                                           |
+| [Convolution Fprop](docs/operations/Convolutions.md)                     | conv_fprop <br>Conv_fprop_attributes                 | conv_fprop                                                                                       |
+| [Convolution Dgrad](docs/operations/Convolutions.md)                     | conv_dgrad <br>Conv_dgrad_attributes                 | conv_dgrad                                                                                       |
+| [Convolution Wgrad](docs/operations/Convolutions.md)                     | conv_wgrad <br>Conv_wgrad_attributes                 | conv_wgrad                                                                                       |
+| [Matrix Multiplication](docs/operations/Matmul.md)                       | matmul <br> Matmul_attributes                        | matmul                                                                                           |
+| [Pointwise Operations](docs/operations/Pointwise.md)                     | pointwise <br> Pointwise_attributes                  | - add<br>- bias<br>- rqsrt<br>- sub<br>- mul<br>- scale<br>- relu<br>- elu<br>- gelu<br>- cmp_gt |
+| [Batch Normalization](docs/operations/Normalizations.md)                 | batchnorm <br>Batchnorm_attributes                   | batchnorm                                                                                        |
+| [Batch Norm bprop](docs/operations/Normalizations.md)                    | batchnorm_backward <br>Batchnorm_backward_attributes | batchnorm_backward                                                                               |
+| Generate stats of output                                                 | genstats <br>Genstats_attributes                     | genstats                                                                                         |
+| BN Finalize of stats                                                     | bn_finalize <br>BN_finalize_attributes               | bn_finalize                                                                                      |
+| Dbn weight                                                               | dbn_weight <br>DBN_weight_attributes                 | dbn_weight                                                                                       |
+| [Resampling](docs/operations/Resampling.md)                              | resample <br>Resample_attributes                     | resample                                                                                         |
+| [Scale dot product attention](docs/operations/Attention.md)              | sdpa<br> SDPA_attributes                             | sdpa                                                                                             |
+| [Scale dot product attention backward](docs/operations/Attention.md)     | sdpa_backward<br> SDPA_backward_attributes           | sdpa_backward                                                                                    |
+| [Scale dot product attention FP8](docs/operations/Attention.md)          | sdpa_fp8<br> SDPA_fp8_attributes                     | sdpa_fp8                                                                                         |
+| [Scale dot product attention backward FP8](docs/operations/Attention.md) | sdpa_fp8_backward<br> SDPA_fp8_backward_attributes   | sdpa_fp8_backward                                                                                |
 
 ### Create Graph
 Instantiate an object of class `cudnn_frontend::graph::Graph` which will house tensors and operations.  
@@ -141,9 +144,12 @@ cudnn_frontend::Graph::build_plan_at_index(
 
 
 ### Filter plans (optional)
-Users can filter out plans against numerical, behavioral notes, or plans that do not provide desired functional correctness.
+Users can filter plans on numerical, behavioral notes, or plans that do not provide desired functional correctness.
 
 ```
+cudnn_frontend::graph::Graph& cudnn_frontend::graph::Plans::select_numeric_notes(std::vector<cudnn_frontend::NumericalNote_t> const&);
+cudnn_frontend::graph::Graph& cudnn_frontend::graph::Plans::select_behavior_notes(std::vector<cudnn_frontend::BehaviorNote_t> const&);
+
 cudnn_frontend::graph::Graph& cudnn_frontend::graph::Plans::deselect_numeric_notes(std::vector<cudnn_frontend::NumericalNote_t> const&);
 cudnn_frontend::graph::Graph& cudnn_frontend::graph::Plans::deselect_behavior_notes(std::vector<cudnn_frontend::BehaviorNote_t> const&);
 cudnn_frontend::graph::Graph& cudnn_frontend::graph::Plans::deselect_workspace_greater_than(int64_t const workspace);