[TensorRT EP] Weightless API integration #20412

chilo-ms · 2024-04-22T21:38:54Z

This PR includes the weight-stripped engine feature (thanks @moraxu for the #20214) which is the major feature for TRT 10 integration.

Two TRT EP options are added:

trt_weight_stripped_engine_enable: Enable weight-stripped engine build and refit.
trt_onnx_model_folder_path: In the quick load case using embedded engine model / EPContext mode, the original onnx filename is in the node's attribute, and this option specifies the directory of that onnx file if needed.

Normal weight-stripped engine workflow:

Weight-stripped engine and quick load workflow:

see the doc here for more information about EPContext model.

onnxruntime/core/providers/tensorrt/tensorrt_execution_provider.cc

moraxu

Thank you for taking on this work!

onnxruntime/core/providers/tensorrt/tensorrt_execution_provider.cc

…t model

onnxruntime/core/providers/tensorrt/onnx_ctx_model_helper.cc

@moraxu

This PR includes the weight-stripped engine feature (thanks @moraxu for the #20214) which is the major feature for TRT 10 integration. Two TRT EP options are added: - `trt_weight_stripped_engine_enable`: Enable weight-stripped engine build and refit. - `trt_onnx_model_folder_path`: In the quick load case using embedded engine model / EPContext mode, the original onnx filename is in the node's attribute, and this option specifies the directory of that onnx file if needed. Normal weight-stripped engine workflow: ![image](https://github.com/microsoft/onnxruntime/assets/54722500/9f314865-cbda-4979-a7ac-b31c7a553b56) Weight-stripped engine and quick load workflow: ![image](https://github.com/microsoft/onnxruntime/assets/54722500/9f31db51-a7a8-495b-ba25-54c7f904cbad) see the doc [here ](https://onnxruntime.ai/docs/execution-providers/TensorRT-ExecutionProvider.html#tensorrt-ep-caches)for more information about EPContext model. --------- Co-authored-by: yf711 <yifanl@microsoft.com> Co-authored-by: Ye Wang <52801275+wangyems@users.noreply.github.com> Co-authored-by: Michal Guzek <moraxu@users.noreply.github.com> Co-authored-by: pengwa <pengwa@microsoft.com> Co-authored-by: wejoncy <wejoncy@163.com> Co-authored-by: Yi Zhang <zhanyi@microsoft.com> Co-authored-by: Yi Zhang <your@email.com> Co-authored-by: Pranav Sharma <prs@microsoft.com> Co-authored-by: Adam Pocock <adam.pocock@oracle.com> Co-authored-by: cao lei <jslhcl@gmail.com> Co-authored-by: Adrian Lizarraga <adlizarraga@microsoft.com> Co-authored-by: inisis <46103969+inisis@users.noreply.github.com> Co-authored-by: Jeff Bloomfield <38966965+jeffbloo@users.noreply.github.com> Co-authored-by: mo-ja <60505697+mo-ja@users.noreply.github.com> Co-authored-by: kunal-vaishnavi <115581922+kunal-vaishnavi@users.noreply.github.com> Co-authored-by: Sumit Agarwal <sumitagarwal330@gmail.com> Co-authored-by: Atanas Dimitrov <70822030+neNasko1@users.noreply.github.com> Co-authored-by: Justin Chu <justinchuby@users.noreply.github.com> Co-authored-by: Yufeng Li <liyufeng1987@gmail.com> Co-authored-by: Dhruv Matani <dhruvbird@gmail.com> Co-authored-by: Dhruv Matani <dhruv.matani@grammarly.com> Co-authored-by: wangshuai09 <391746016@qq.com> Co-authored-by: Xiaoyu <85524621+xiaoyu-work@users.noreply.github.com> Co-authored-by: Xu Xing <xing.xu@intel.com> Co-authored-by: Dmitri Smirnov <yuslepukhin@users.noreply.github.com> Co-authored-by: Rachel Guo <35738743+YUNQIUGUO@users.noreply.github.com> Co-authored-by: Sai Kishan Pampana <sai.kishan.pampana@intel.com> Co-authored-by: rachguo <rachguo@rachguos-Mini.attlocal.net> Co-authored-by: Jian Chen <cjian@microsoft.com> Co-authored-by: Shubham Bhokare <32080845+shubhambhokare1@users.noreply.github.com> Co-authored-by: Yulong Wang <7679871+fs-eire@users.noreply.github.com> Co-authored-by: Andrew Fantino <15876180+afantino951@users.noreply.github.com> Co-authored-by: Thomas Boby <thomas@boby.uk> Co-authored-by: Tianlei Wu <tlwu@microsoft.com> Co-authored-by: Scott McKay <skottmckay@gmail.com> Co-authored-by: Michal Guzek <mguzek@nvidia.com> Co-authored-by: George Wu <jywu@microsoft.com>

yf711 and others added 30 commits March 11, 2024 13:34

Init new yml & dockerfile to update TRT CI

89f6d75

update

6006682

update

71c817d

add cuda 12.4 support

9d755df

Update win/linux trt yml to cu123 and latest trt

f42505f

test trt CIs with 10.0.0.2

9dc2990

Update win trt ver for EA

aca88a9

fix

7212de7

fix

f830fed

fix

8fc5dd2

update

040d27f

fix

decbb47

Make TRT EP supports INT64 for TRT 10

5815bd6

Fix compile warning

4166b83

merge main

f373f67

update

494c970

update

adb4d3c

clean

0959856

update ep perf ci dockerfile

b188108

update

295dd33

update linux trt ci dockerfile for new trt10

dfdc36a

update ep perf ci dockerfile with latest trt10

eef015d

Merge

8cb808d

switch condition of linux trt ci dockerfiles

3d3a604

temp fix

9ab0f41

fix on ep perf ci dockerfiles

b59aa8a

fix

7234573

update on ep perf trt bin dockerfile

5157df7

debug

fdde93a

test

33f36cc

chilo-ms added 3 commits May 24, 2024 06:13

lintrunner -a

421e59c

minor update

d666ce7

add more comments

70fc577

kevinch-nv reviewed May 24, 2024

View reviewed changes

onnxruntime/core/providers/tensorrt/tensorrt_execution_provider.cc Show resolved Hide resolved

onnxruntime/core/providers/tensorrt/tensorrt_execution_provider.cc Show resolved Hide resolved

moraxu approved these changes May 24, 2024

View reviewed changes

onnxruntime/core/providers/tensorrt/tensorrt_execution_provider.cc Outdated Show resolved Hide resolved

onnxruntime/core/providers/tensorrt/tensorrt_execution_provider.cc Outdated Show resolved Hide resolved

jywu-msft requested a review from HectorSVC May 24, 2024 21:39

chilo-ms and others added 6 commits May 24, 2024 21:40

update and modify per reviewer's comment

a5d2085

update contrib op doc

b6c275d

refactor

38dafc1

modify contrib op doc

ddc2ec8

code refactor

39526f1

fix format

db2fce6

jywu-msft previously approved these changes May 25, 2024

View reviewed changes

Check weight-stripped engine cache automatically in the case EPContex…

a8b9662

…t model

chilo-ms dismissed jywu-msft’s stale review via a8b9662 May 25, 2024 21:24

jywu-msft reviewed May 26, 2024

View reviewed changes

onnxruntime/core/providers/tensorrt/onnx_ctx_model_helper.cc Show resolved Hide resolved

jywu-msft reviewed May 26, 2024

View reviewed changes

onnxruntime/core/providers/tensorrt/onnx_ctx_model_helper.cc Show resolved Hide resolved

jywu-msft reviewed May 26, 2024

View reviewed changes

onnxruntime/core/providers/tensorrt/onnx_ctx_model_helper.cc Outdated Show resolved Hide resolved

add some verbose logging

14765fa

jywu-msft previously approved these changes May 26, 2024

View reviewed changes

Add comments and change function name per reviewer's comment

aad9f86

chilo-ms dismissed jywu-msft’s stale review via aad9f86 May 26, 2024 15:30

fix compiler error

8284c8c

jywu-msft approved these changes May 26, 2024

View reviewed changes

jywu-msft merged commit 454fcdd into main May 26, 2024
93 of 96 checks passed

jywu-msft deleted the yifanl/chi_trt10+dockerfile branch May 26, 2024 19:24

jywu-msft added the ep:TensorRT issues related to TensorRT execution provider label May 26, 2024

sophies927 added the triage:approved Approved for cherrypicks for release label Jun 11, 2024

jywu-msft added the 1.18.1 essential label Jun 20, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[TensorRT EP] Weightless API integration #20412

[TensorRT EP] Weightless API integration #20412

chilo-ms commented Apr 22, 2024 •

edited

Loading

moraxu left a comment

[TensorRT EP] Weightless API integration #20412

[TensorRT EP] Weightless API integration #20412

Conversation

chilo-ms commented Apr 22, 2024 • edited Loading

moraxu left a comment

Choose a reason for hiding this comment

chilo-ms commented Apr 22, 2024 •

edited

Loading