[ORT 1.18.1 Release] Cherry pick 3rd round #21129

yf711 · 2024-06-21T00:29:44Z

Description

Adding critical TensorRT EP support

Motivation and Context

@moraxu

This PR includes the weight-stripped engine feature (thanks @moraxu for the #20214) which is the major feature for TRT 10 integration. Two TRT EP options are added: - `trt_weight_stripped_engine_enable`: Enable weight-stripped engine build and refit. - `trt_onnx_model_folder_path`: In the quick load case using embedded engine model / EPContext mode, the original onnx filename is in the node's attribute, and this option specifies the directory of that onnx file if needed. Normal weight-stripped engine workflow: ![image](https://github.com/microsoft/onnxruntime/assets/54722500/9f314865-cbda-4979-a7ac-b31c7a553b56) Weight-stripped engine and quick load workflow: ![image](https://github.com/microsoft/onnxruntime/assets/54722500/9f31db51-a7a8-495b-ba25-54c7f904cbad) see the doc [here ](https://onnxruntime.ai/docs/execution-providers/TensorRT-ExecutionProvider.html#tensorrt-ep-caches)for more information about EPContext model. --------- Co-authored-by: yf711 <yifanl@microsoft.com> Co-authored-by: Ye Wang <52801275+wangyems@users.noreply.github.com> Co-authored-by: Michal Guzek <moraxu@users.noreply.github.com> Co-authored-by: pengwa <pengwa@microsoft.com> Co-authored-by: wejoncy <wejoncy@163.com> Co-authored-by: Yi Zhang <zhanyi@microsoft.com> Co-authored-by: Yi Zhang <your@email.com> Co-authored-by: Pranav Sharma <prs@microsoft.com> Co-authored-by: Adam Pocock <adam.pocock@oracle.com> Co-authored-by: cao lei <jslhcl@gmail.com> Co-authored-by: Adrian Lizarraga <adlizarraga@microsoft.com> Co-authored-by: inisis <46103969+inisis@users.noreply.github.com> Co-authored-by: Jeff Bloomfield <38966965+jeffbloo@users.noreply.github.com> Co-authored-by: mo-ja <60505697+mo-ja@users.noreply.github.com> Co-authored-by: kunal-vaishnavi <115581922+kunal-vaishnavi@users.noreply.github.com> Co-authored-by: Sumit Agarwal <sumitagarwal330@gmail.com> Co-authored-by: Atanas Dimitrov <70822030+neNasko1@users.noreply.github.com> Co-authored-by: Justin Chu <justinchuby@users.noreply.github.com> Co-authored-by: Yufeng Li <liyufeng1987@gmail.com> Co-authored-by: Dhruv Matani <dhruvbird@gmail.com> Co-authored-by: Dhruv Matani <dhruv.matani@grammarly.com> Co-authored-by: wangshuai09 <391746016@qq.com> Co-authored-by: Xiaoyu <85524621+xiaoyu-work@users.noreply.github.com> Co-authored-by: Xu Xing <xing.xu@intel.com> Co-authored-by: Dmitri Smirnov <yuslepukhin@users.noreply.github.com> Co-authored-by: Rachel Guo <35738743+YUNQIUGUO@users.noreply.github.com> Co-authored-by: Sai Kishan Pampana <sai.kishan.pampana@intel.com> Co-authored-by: rachguo <rachguo@rachguos-Mini.attlocal.net> Co-authored-by: Jian Chen <cjian@microsoft.com> Co-authored-by: Shubham Bhokare <32080845+shubhambhokare1@users.noreply.github.com> Co-authored-by: Yulong Wang <7679871+fs-eire@users.noreply.github.com> Co-authored-by: Andrew Fantino <15876180+afantino951@users.noreply.github.com> Co-authored-by: Thomas Boby <thomas@boby.uk> Co-authored-by: Tianlei Wu <tlwu@microsoft.com> Co-authored-by: Scott McKay <skottmckay@gmail.com> Co-authored-by: Michal Guzek <mguzek@nvidia.com> Co-authored-by: George Wu <jywu@microsoft.com>

### Description  - Introduce option `trt_engine_hw_compatible` to support engine hardware compatibility for Ampere+ GPUs - This enables `nvinfer1::HardwareCompatibilityLevel::kAMPERE_PLUS` flag when generating engines - This option has been validated on sm80/86 GPUs, as engine can be reused across different ampere+ arch: - Client side need to enable this option as well to leverage existing sm80+ engines - If this option is enabled by users which TRT<8.6 or sm<80, there will be a warning showing this option not supported Engine naming: | When | `trt_engine_hw_compat=false` | `trt_engine_hw_compat=true` | | -------------- | ------------------------------------------------------------ | ------------------------------------------------------------ | | A100 (sm80) | TensorrtExecutionProvider_TRTKernel_graph_torch-jit-export_9454133937466702238_0_0_sm**80**.engine | TensorrtExecutionProvider_TRTKernel_graph_torch-jit-export_9454133937466702238_0_0_sm**80+**.engine | | RTX3080 (sm86) | TensorrtExecutionProvider_TRTKernel_graph_torch-jit-export_9454133937466702238_0_0_sm**86**.engine | TensorrtExecutionProvider_TRTKernel_graph_torch-jit-export_9454133937466702238_0_0_sm**80+**.engine | ### Motivation and Context  Reference: https://docs.nvidia.com/deeplearning/tensorrt/developer-guide/index.html#hardware-compat --------- Co-authored-by: Chi Lo <54722500+chilo-ms@users.noreply.github.com>

### Description  * Partially revert [previous change](#19804), and * Redo concurrency_test_result parser outside of post.py * Add support of syncing memtest result to db ### Motivation and Context  To fix the error when CI is running on two model groups. - When running on two model groups, the [previous change](#19804) wrongly navigates two levels up in the directory after running one model group, while one level is needed. After that, the script can't find another model group. - Running on one model group can't repro the issue

…TRT 10 (#20738) TRT 10 now natively supports int64 tensor, so needs to updating the code where binding the ORT kernel output with DDS int64 output.

The 10 GA branch updated with several issues fixed. https://github.com/onnx/onnx-tensorrt/commits/10.0-GA/

…21101) This PR is a duplicate of the #21041 Create this PR in case the original one can't be updated for patch release timeline.

### Motivation and Context #20765

cmake/deps.txt

@snnn

microsoft/STL#3824 introduces constexpr mutex. An older version of msvcp140.dll will lead to ```A dynamic link library (DLL) initialization routine failed```. This error can be encountered if using conda Python since conda packages msvc dlls and these are older right now. This PR disables the constexpr mutex so that ort package can work with older msvc dlls. Thanks @snnn for the discovery.

chilo-ms and others added 7 commits June 20, 2024 17:23

[TensorRT EP] Update ORT kernel output with TRT DDS int64 output for …

4dba9fb

…TRT 10 (#20738) TRT 10 now natively supports int64 tensor, so needs to updating the code where binding the ORT kernel output with DDS int64 output.

[TensorRT EP] Use latest commit of onnx-tensorrt parser (#20758)

8ccd07a

The 10 GA branch updated with several issues fixed. https://github.com/onnx/onnx-tensorrt/commits/10.0-GA/

Add support for INT64 types in TensorRT constant layer calibration (#…

57475ff

…21101) This PR is a duplicate of the #21041 Create this PR in case the original one can't be updated for patch release timeline.

fix a build error with cuda 12.5 (#20770)

ece13d8

### Motivation and Context #20765

yf711 requested a review from a team as a code owner June 21, 2024 00:29

snnn reviewed Jun 21, 2024

View reviewed changes

cmake/deps.txt Show resolved Hide resolved

yf711 added 2 commits June 20, 2024 18:32

promote CI dep version

0e5288f

updated cgmanifest.json for rel-1.18.1

3f3a6f1

yf711 requested a review from a team as a code owner June 21, 2024 06:24

jywu-msft approved these changes Jun 22, 2024

View reviewed changes

jywu-msft requested a review from chilo-ms June 22, 2024 03:56

chilo-ms approved these changes Jun 24, 2024

View reviewed changes

yf711 merged commit d0aee20 into rel-1.18.1 Jun 24, 2024
216 of 245 checks passed

yf711 deleted the yifanl/round-3-cherry-pick-rel-1.18.1 branch June 24, 2024 17:02

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[ORT 1.18.1 Release] Cherry pick 3rd round #21129

[ORT 1.18.1 Release] Cherry pick 3rd round #21129

yf711 commented Jun 21, 2024

[ORT 1.18.1 Release] Cherry pick 3rd round #21129

[ORT 1.18.1 Release] Cherry pick 3rd round #21129

Conversation

yf711 commented Jun 21, 2024

Description

Motivation and Context