-
Notifications
You must be signed in to change notification settings - Fork 3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[ORT 1.18.1 Release] Cherry pick 3rd round #21129
Merged
Merged
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This PR includes the weight-stripped engine feature (thanks @moraxu for the #20214) which is the major feature for TRT 10 integration. Two TRT EP options are added: - `trt_weight_stripped_engine_enable`: Enable weight-stripped engine build and refit. - `trt_onnx_model_folder_path`: In the quick load case using embedded engine model / EPContext mode, the original onnx filename is in the node's attribute, and this option specifies the directory of that onnx file if needed. Normal weight-stripped engine workflow: ![image](https://github.com/microsoft/onnxruntime/assets/54722500/9f314865-cbda-4979-a7ac-b31c7a553b56) Weight-stripped engine and quick load workflow: ![image](https://github.com/microsoft/onnxruntime/assets/54722500/9f31db51-a7a8-495b-ba25-54c7f904cbad) see the doc [here ](https://onnxruntime.ai/docs/execution-providers/TensorRT-ExecutionProvider.html#tensorrt-ep-caches)for more information about EPContext model. --------- Co-authored-by: yf711 <yifanl@microsoft.com> Co-authored-by: Ye Wang <52801275+wangyems@users.noreply.github.com> Co-authored-by: Michal Guzek <moraxu@users.noreply.github.com> Co-authored-by: pengwa <pengwa@microsoft.com> Co-authored-by: wejoncy <wejoncy@163.com> Co-authored-by: Yi Zhang <zhanyi@microsoft.com> Co-authored-by: Yi Zhang <your@email.com> Co-authored-by: Pranav Sharma <prs@microsoft.com> Co-authored-by: Adam Pocock <adam.pocock@oracle.com> Co-authored-by: cao lei <jslhcl@gmail.com> Co-authored-by: Adrian Lizarraga <adlizarraga@microsoft.com> Co-authored-by: inisis <46103969+inisis@users.noreply.github.com> Co-authored-by: Jeff Bloomfield <38966965+jeffbloo@users.noreply.github.com> Co-authored-by: mo-ja <60505697+mo-ja@users.noreply.github.com> Co-authored-by: kunal-vaishnavi <115581922+kunal-vaishnavi@users.noreply.github.com> Co-authored-by: Sumit Agarwal <sumitagarwal330@gmail.com> Co-authored-by: Atanas Dimitrov <70822030+neNasko1@users.noreply.github.com> Co-authored-by: Justin Chu <justinchuby@users.noreply.github.com> Co-authored-by: Yufeng Li <liyufeng1987@gmail.com> Co-authored-by: Dhruv Matani <dhruvbird@gmail.com> Co-authored-by: Dhruv Matani <dhruv.matani@grammarly.com> Co-authored-by: wangshuai09 <391746016@qq.com> Co-authored-by: Xiaoyu <85524621+xiaoyu-work@users.noreply.github.com> Co-authored-by: Xu Xing <xing.xu@intel.com> Co-authored-by: Dmitri Smirnov <yuslepukhin@users.noreply.github.com> Co-authored-by: Rachel Guo <35738743+YUNQIUGUO@users.noreply.github.com> Co-authored-by: Sai Kishan Pampana <sai.kishan.pampana@intel.com> Co-authored-by: rachguo <rachguo@rachguos-Mini.attlocal.net> Co-authored-by: Jian Chen <cjian@microsoft.com> Co-authored-by: Shubham Bhokare <32080845+shubhambhokare1@users.noreply.github.com> Co-authored-by: Yulong Wang <7679871+fs-eire@users.noreply.github.com> Co-authored-by: Andrew Fantino <15876180+afantino951@users.noreply.github.com> Co-authored-by: Thomas Boby <thomas@boby.uk> Co-authored-by: Tianlei Wu <tlwu@microsoft.com> Co-authored-by: Scott McKay <skottmckay@gmail.com> Co-authored-by: Michal Guzek <mguzek@nvidia.com> Co-authored-by: George Wu <jywu@microsoft.com>
### Description <!-- Describe your changes. --> - Introduce option `trt_engine_hw_compatible` to support engine hardware compatibility for Ampere+ GPUs - This enables `nvinfer1::HardwareCompatibilityLevel::kAMPERE_PLUS` flag when generating engines - This option has been validated on sm80/86 GPUs, as engine can be reused across different ampere+ arch: - Client side need to enable this option as well to leverage existing sm80+ engines - If this option is enabled by users which TRT<8.6 or sm<80, there will be a warning showing this option not supported Engine naming: | When | `trt_engine_hw_compat=false` | `trt_engine_hw_compat=true` | | -------------- | ------------------------------------------------------------ | ------------------------------------------------------------ | | A100 (sm80) | TensorrtExecutionProvider_TRTKernel_graph_torch-jit-export_9454133937466702238_0_0_sm**80**.engine | TensorrtExecutionProvider_TRTKernel_graph_torch-jit-export_9454133937466702238_0_0_sm**80+**.engine | | RTX3080 (sm86) | TensorrtExecutionProvider_TRTKernel_graph_torch-jit-export_9454133937466702238_0_0_sm**86**.engine | TensorrtExecutionProvider_TRTKernel_graph_torch-jit-export_9454133937466702238_0_0_sm**80+**.engine | ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. --> Reference: https://docs.nvidia.com/deeplearning/tensorrt/developer-guide/index.html#hardware-compat --------- Co-authored-by: Chi Lo <54722500+chilo-ms@users.noreply.github.com>
### Description <!-- Describe your changes. --> * Partially revert [previous change](#19804), and * Redo concurrency_test_result parser outside of post.py * Add support of syncing memtest result to db ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. --> To fix the error when CI is running on two model groups. - When running on two model groups, the [previous change](#19804) wrongly navigates two levels up in the directory after running one model group, while one level is needed. After that, the script can't find another model group. - Running on one model group can't repro the issue
…TRT 10 (#20738) TRT 10 now natively supports int64 tensor, so needs to updating the code where binding the ORT kernel output with DDS int64 output.
The 10 GA branch updated with several issues fixed. https://github.com/onnx/onnx-tensorrt/commits/10.0-GA/
### Motivation and Context #20765
snnn
reviewed
Jun 21, 2024
microsoft/STL#3824 introduces constexpr mutex. An older version of msvcp140.dll will lead to ```A dynamic link library (DLL) initialization routine failed```. This error can be encountered if using conda Python since conda packages msvc dlls and these are older right now. This PR disables the constexpr mutex so that ort package can work with older msvc dlls. Thanks @snnn for the discovery.
jywu-msft
approved these changes
Jun 22, 2024
chilo-ms
approved these changes
Jun 24, 2024
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Description
Adding critical TensorRT EP support
Motivation and Context