Skip to content

Commit

Permalink
Release notes for cudnn-frontend 1.5.0: (#81)
Browse files Browse the repository at this point in the history
[New feature] With cudnn backend 9.2.0 and above, `Graph::check_support`
can determine support check for runtime engines without invoking the
nvrtc compiler. This allows users to check the support surface of cudnn
without invoking the nvrtc compilation.

[New feature] Python pip wheel now contains the necessary c++
development headers.

[New feature] Sliding window attention is now supported as an attribute
to the sdpa forward and bprop node. Usage:
`sdpa_attributes.set_sliding_window_length(window_length)`

[New feature] Bottom right aligned causal masking is now supported as an
attribute to the sdpa forward and bprop node. Usage:
`sdpa_attributes.use_causal_mask_bottom_right(true)`

[New feature] SDPA bprop attributes can choose deterministic algorithm
using the `use_deterministic_algorithm` API.

[New feature] Allow users to filter candidate execution plans of graph
by its shared memory usage in cudnn 9.2.0 and later.

[Bug fix] A runtime error if chosen execution plan candidate is
incorrectly set in the backend has been fixed. This would happen when
`check_support` does not correctly filter by the workspace size.

[Bug fix] selecting/deselecting by behavior and numerical notes has now
been fixed and works as intended.

[Debugging] A new tool for easy reproduction of a failure using the json
representation of the graph can be found [here](tools/json_reproducer).

[Samples] Restructured the cpp samples into categories for easier
navigation.

[Samples] Added a sample to showcase how different plans can be built in
parallel in separate threads.

[Compilation enhancement] Added a new macro
`CUDNN_FRONTEND_SKIP_NLOHMANN_JSON` as compilation flag to not have
nlohman::json as compilation dependency. Users lose access to certain
API functions like `print`, `key`, `serialize`, `deserialzie` that
depend on the library.

[Enhancement] Serialization of resample operation is now supported.

[Enhancement] Bug template has been added for new github issues
  • Loading branch information
Anerudhan authored Jun 13, 2024
1 parent d7ccb5b commit 47d800c
Show file tree
Hide file tree
Showing 112 changed files with 5,033 additions and 2,443 deletions.
6 changes: 3 additions & 3 deletions CMakeLists.txt
Original file line number Diff line number Diff line change
@@ -1,8 +1,8 @@
cmake_minimum_required(VERSION 3.17)

project(cudnn_frontend VERSION 1.4.0)
project(cudnn_frontend VERSION 1.5.0)

option(CUDNN_FRONTEND_SKIP_NLOHMANN_JSON "Defines whether FE should not include nlohmann/json.hpp." OFF)
option(CUDNN_FRONTEND_SKIP_JSON_LIB "Defines whether FE should not include nlohmann/json.hpp." OFF)
option(CUDNN_FRONTEND_BUILD_SAMPLES "Defines if samples are built or not." ON)
option(CUDNN_FRONTEND_BUILD_UNIT_TESTS "Defines if unittests are built or not." ON)

Expand All @@ -18,7 +18,7 @@ add_library(cudnn_frontend INTERFACE)

target_compile_definitions(
cudnn_frontend INTERFACE
$<$<BOOL:${CUDNN_FRONTEND_SKIP_NLOHMANN_JSON}>:CUDNN_FRONTEND_SKIP_NLOHMANN_JSON>
$<$<BOOL:${CUDNN_FRONTEND_SKIP_JSON_LIB}>:CUDNN_FRONTEND_SKIP_JSON_LIB>
)

target_include_directories(
Expand Down
40 changes: 21 additions & 19 deletions README.FE.1.0.md
Original file line number Diff line number Diff line change
Expand Up @@ -12,6 +12,11 @@
FE v1.0 API is aimed to extend functionality and usage exposed by the [cuDNN C backend API](https://docs.nvidia.com/deeplearning/cudnn/api/index.html#cudnn-backend-api). Both C++ and python APIs are provided, and both have functional parity.
For a general introduction to FE, please start with README.md.

In the frontend v1 API, you can describe multiple operations that form subgraphs through a persistent cudnn_frontend::graph::Graph object. Unlike the frontend v0.x API, you don't have to worry about specifying shapes and sizes of the intermediate virtual tensors. The frontend v1 API extends the groundwork of earlier versions and introduces a new set of APIs to further simplify the workflow.

Additionally, the frontend v1 API provides Python bindings to all API. Refer to samples/cpp and samples/python for more details on its usage.
With the release of v1, we are bumping up the minimum supported cuDNN version to 8.5.0.

## Workflow
The steps involved in building and running a cudnn graph are as follows:
1. Create a cudnn graph and specify the global properties. The global properties like compute precision and input/output data type help infer properties that are not explicitly mentioned.
Expand All @@ -20,10 +25,10 @@ The steps involved in building and running a cudnn graph are as follows:
4. Validate the operation graph. This step makes sure the graph is well built and does not have hanging tensors or node.
5. Build the cudnn operation graph. This step lowers the graph into cudnn dialect.
6. Create the execution plan, based on the heuristics type of your choice.
7. [Optional] Check support of the operation graph.
7. Check support of the operation graph.
8. [Optional] Filter out the plans by your custom criteria (Optional).
9. Build (one or all) the execution plans.
10. [Optional] Run autotuning on the filter plan (Optional).
10. [Optional] Run autotuning on the filtered plan (Optional).
11. Execute the graph with the relevant data pointers.

## APIs
Expand All @@ -48,7 +53,7 @@ FE v1.0 API follows a functional style of building a graph. Operations take in i
| [Scale dot product attention FP8](docs/operations/Attention.md) | sdpa_fp8<br> SDPA_fp8_attributes | sdpa_fp8 |
| [Scale dot product attention backward FP8](docs/operations/Attention.md) | sdpa_fp8_backward<br> SDPA_fp8_backward_attributes | sdpa_fp8_backward |

### Create Graph
### Creating the Graph
Instantiate an object of class `cudnn_frontend::graph::Graph` which will house tensors and operations.

Optional graph level attributes can be set on the object:
Expand All @@ -71,53 +76,53 @@ Tensor attributes is a lightweight structure with setters for each attribute.
- `cudnn_frontend::graph::Tensor_attributes& set_reordering_type(cudnn_frontend::TensorReordering_t)`
- `cudnn_frontend::graph::Tensor_attributes& set_name(std::string&)`

### Define Operations
### Defining Operations
Operations take in mandatory input tensor via positional arguments. Optional input tensors are provided using corresponding setters in operation attributes.

Operations return an ordered array of output tensors. Any optional outputs if not present will have their shared pointers pointing to `std::nullptr`.

Please looks at [operations](#Operations) section for more details.

### Validate graph
### Validating the Graph
Validate API ensures API usage is sound, checks against dangling tensors, etc.
Internally, any unspecified properties like dimensions, strides, etc are inferred.

```
cudnn_frontend::error_t cudnn_frontend::graph::Graph::validate()
```

### Build cudnn backend graph
### Building the Backend Graph
This method creates cudnn backend descriptors for all constituents of the graph.

```
cudnn_frontend::error_t cudnn_frontend::graph::Graph::build_operation_graph(cudnnHandle_t handle)
```

### Create Execution plans
### Creating the Execution Plan
This method internally queries the heuristics for engine configs for the given heuristics modes.

```
cudnn_frontend::error_t cudnn_frontend::graph::Graph::get_execution_plans(std::vector<heur_mode_t>)
```

### Get execution plan count
### Getting the Execution Plan Count
This method returns the number of execution plans returned by cudnn heuristics. Each plan gets an index from 0 to #plans-1, with 0 having top priority.

```
cudnn_frontend::int64_t
cudnn_frontend::Graph::get_execution_plan_count() const;
```

### Check graph support
### Checking Graph Support
This method guarantees that executing the graph using plans queried will succeed.

```
cudnn_frontend::error_t cudnn_frontend::graph::Graph::check_support(cudnnHandle_t h);
```

### Build plans
### Building the Execution Plan

This function builds execution plans queried with `create_execution_plan(...)`` API.
This function builds execution plans queried with `create_execution_plan(...)` API.

There are two flavours of this API:

Expand All @@ -140,10 +145,7 @@ cudnn_frontend::Graph::build_plan_at_index(
int64_t plan_index
);
```



### Filter plans (optional)
### Filtering Plans (Optional)
Users can filter plans on numerical, behavioral notes, or plans that do not provide desired functional correctness.

```
Expand All @@ -155,15 +157,15 @@ cudnn_frontend::graph::Graph& cudnn_frontend::graph::Plans::deselect_behavior_no
cudnn_frontend::graph::Graph& cudnn_frontend::graph::Plans::deselect_workspace_greater_than(int64_t const workspace);
```

### Autotune
### Autotuning

Autotuning provides a way to execute different execution plans for a given graph and measure their relative performance under run time conditions.
This generally helps validate and improve upon the results provided by the heuristics. Please refer to [samples](samples/cpp/autotuning.cpp)

### Execute
Executing graph requires device pointers to all input output tensors and a user allocated device workspace pointer.
### Executing the Graph
Executing the graph requires device pointers to all input output tensors and a user allocated device workspace pointer.

Two flavours of execute exists, corresponding to `build_plans(...)`` API.
Two flavours of execute exists, corresponding to `build_plans(...)` API.

This API already has a candidate execution plan set. Candidate execution plan get internally set either:
- if build_policy_t::HEURISTIC_CHOICE is used, or
Expand Down
32 changes: 15 additions & 17 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -15,7 +15,9 @@ In FE v1.0 API, users can describe multiple operations that form subgraph throug
Additionally, FE v1.0 API provides python bindings to all API through pybind11. It is recommended that new users of cuDNN start with the frontend v1.0 API. See `samples/cpp` and `samples/python` for more details on its usage.

## Usage
In order to include the entire library, include the cudnn_frontend header file `include/cudnn_frontend.h` into your compilation unit.
For c++ users, in order to include the entire library, include the cudnn_frontend header file `include/cudnn_frontend.h` into your compilation unit.

For Python users, run `import cudnn`

## Build:

Expand All @@ -31,33 +33,30 @@ cudnn can be installed from
Minimum python version needed 3.6
The python binding compilation requires development package which can be installed by running `apt-get install python-dev`.

To run the python samples, additionally, you will need the following python packages:
- pytest
- torch
- jupyter

To run the Python samples, you will need the dependencies mentioned in `requirements.txt`. This can be be installed by running:
`pip install -r requirements.txt`

### Python API

#### pip wheel installation

Download the pip wheel corresponding to your python installation.

```
pip install nvidia_cudnn_frontend
```

#### Source installation:
Install FE python API by running:
```
pip install git+https://github.com/NVIDIA/cudnn-frontend.git
pip install -v git+https://github.com/NVIDIA/cudnn-frontend.git
```

Above command picks cuda and cudnn from default system paths.

To provide a custom CUDA installation path, use environment variable: `CUDAToolkit_ROOT`.
To provide a custom CUDNN installation path, use environment variable: `CUDNN_PATH`.

#### pip wheel installation

Download the pip wheel corresponding to your python installation.

```
pip install nvidia_cudnn_frontend-1.2.0-*.whl
```

#### Checking the installation
To test whether installation is successful, run:
```
Expand All @@ -66,15 +65,14 @@ pytest test/python_fe

NOTE: Only v1.0 API is exposed via python bindings.


### C++ API

C++ API is header only library.

The root CMakeLists.txt can be used as reference to include the cudnn_frontend in your project's build system.

#### Building samples
The following compilation steps are only required for building the samples and/or python bindings.
The following compilation steps are only required for building the samples.

Provide CUDA installation path according to: https://cmake.org/cmake/help/latest/module/FindCUDAToolkit.html

Expand Down
Loading

0 comments on commit 47d800c

Please sign in to comment.