-
Notifications
You must be signed in to change notification settings - Fork 90
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
[New API] A new overloaded variant of execute has been added which al…
…lows the variant pack to be mentioned as pair of "uid, device pointer". In order to use this, the expectation is user will provide the uid for the tensors created. (#60) ``` error_t cudnn_frontend::graph::Graph::execute(cudnnHandle_t handle, std::unordered_map<int64_t, void*>& tensor_to_pointer_map, void *workspace) const; ``` [New API] Serialization: Graph class can now be serialized once the final plan is built. The corresponding deserialized plan requires the handle to be created on the same device the original graph was created with. Serialization is only supported on Runtime compiled engines. This support may be extended to other engines in future. New samples showcasing this have been added in `samples/cpp/serialization.cpp` ``` error_t cudnn_frontend::graph::Graph::serialize(std::vector<uint8_t>& data) const; error_t cudnn_frontend::graph::Graph::deserialize(cudnnHandle_t handle, std::vector<uint8_t> const& data); ``` [New API] Autotuning: If the graph allows multiple engine configs for a given topology, each of this can now be built and executed in parallel. The expected flow is user queries the number of plans present and spawns a new thread for each plan to be finalized in parallel. The set of APIs to support this are as follows: ``` int64_t Graph::get_execution_plan_count() const; error_t Graph::build_plan_at_index(cudnnHandle_t const &handle, int64_t index); error_t Graph::execute_plan_at_index(cudnnHandle_t const &handle, std::unordered_map<int64_t, void*>& , void* workspace, int64_t plan_index) const; int64_t get_workspace_size_plan_at_index(int64_t plan_index) const; ``` [New feature] sdpa_node now allows ragged offset to be set in the input and output tensors. [Bug Fix] Certain parts of the FE code, used to throw excpetion even with `DISABLE_EXCEPTION` flag set. This has been cleaned up. [Bug Fix] For sdpa node, cudnn now correctly returns `NOT_SUPPORTED` when s_q is not a multiple of 64 and padding mask is on. [Bug Fix] For sdpa backward node, cudnn now correctly returns `NOT_SUPPORTED` when s_q is less than 64. [Bug Fix] Fixed an issue with pointwise Modulo operation. [Bug Fix] Fixed an issue in sdpa node, where the intermediate data types were wrong. [Samples] Added a sample to showcase matmul with int8 and FP8 precisions. [Cleanup] Python samples have moved from `samples/python` to `tests/python_fe`. [Cleanup] Removed the `cudnn_frontend::throw_if` function.
- Loading branch information
Showing
62 changed files
with
3,904 additions
and
1,479 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.