Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[TensorRT EP] Weightless API integration #20412

Merged
merged 124 commits into from
May 26, 2024
Merged
Changes from 1 commit
Commits
Show all changes
124 commits
Select commit Hold shift + click to select a range
89f6d75
Init new yml & dockerfile to update TRT CI
yf711 Mar 11, 2024
6006682
update
yf711 Mar 14, 2024
71c817d
update
yf711 Mar 14, 2024
9d755df
add cuda 12.4 support
yf711 Mar 15, 2024
f42505f
Update win/linux trt yml to cu123 and latest trt
yf711 Mar 18, 2024
9dc2990
test trt CIs with 10.0.0.2
yf711 Mar 18, 2024
aca88a9
Update win trt ver for EA
yf711 Mar 18, 2024
7212de7
fix
yf711 Mar 19, 2024
f830fed
fix
yf711 Mar 20, 2024
8fc5dd2
fix
yf711 Mar 20, 2024
040d27f
update
yf711 Mar 20, 2024
decbb47
fix
yf711 Mar 20, 2024
5815bd6
Make TRT EP supports INT64 for TRT 10
chilo-ms Mar 22, 2024
4166b83
Fix compile warning
chilo-ms Mar 25, 2024
f373f67
merge main
yf711 Mar 27, 2024
494c970
update
yf711 Mar 27, 2024
adb4d3c
update
yf711 Mar 27, 2024
0959856
clean
yf711 Mar 27, 2024
b188108
update ep perf ci dockerfile
yf711 Mar 27, 2024
295dd33
update
yf711 Mar 27, 2024
dfdc36a
update linux trt ci dockerfile for new trt10
yf711 Mar 29, 2024
eef015d
update ep perf ci dockerfile with latest trt10
yf711 Mar 29, 2024
8cb808d
Merge
yf711 Apr 1, 2024
3d3a604
switch condition of linux trt ci dockerfiles
yf711 Apr 1, 2024
9ab0f41
temp fix
yf711 Apr 1, 2024
b59aa8a
fix on ep perf ci dockerfiles
yf711 Apr 1, 2024
7234573
fix
yf711 Apr 1, 2024
5157df7
update on ep perf trt bin dockerfile
yf711 Apr 1, 2024
fdde93a
debug
yf711 Apr 2, 2024
33f36cc
test
yf711 Apr 2, 2024
652b27d
fix
yf711 Apr 2, 2024
a233e86
disable trtexec
yf711 Apr 2, 2024
15054fe
update onnx-tensorrt to 10.0-EA
yf711 Apr 2, 2024
a420e73
Merge branch 'yifanl/trtep_update_ci_dockerfile' into yifanl/chi_trt1…
yf711 Apr 2, 2024
6988df8
revert
yf711 Apr 2, 2024
3ef6fa8
revert
yf711 Apr 2, 2024
d2d7e90
Merge branch 'main' into yifanl/trtep_update_ci_dockerfile
yf711 Apr 2, 2024
bf70e3e
Fix py package pipeline (#20065)
wangyems Mar 27, 2024
40cdbfa
fix
yf711 Apr 2, 2024
d2edf5b
fix
yf711 Apr 2, 2024
cb8ece1
fix
yf711 Apr 2, 2024
d1f2af9
fix
yf711 Apr 2, 2024
133e05c
fix
yf711 Apr 2, 2024
91b7091
slim test
yf711 Apr 2, 2024
75531b6
fix
yf711 Apr 2, 2024
1714675
test win trt ci with trt10-cu118
yf711 Apr 3, 2024
9084e85
Merge branch 'main' into yifanl/chi_trt10
yf711 Apr 3, 2024
d7dd1e3
Merge branch 'main' into yifanl/trtep_update_ci_dockerfile
yf711 Apr 3, 2024
5622cba
set default, revert extra changes
yf711 Apr 3, 2024
7dcac7e
update setup_env_trt.bat
yf711 Apr 3, 2024
6cc9068
Merge branch 'yifanl/trtep_update_ci_dockerfile' into yifanl/chi_trt1…
yf711 Apr 3, 2024
1e6efba
test skipping failed tests
yf711 Apr 4, 2024
5957f14
update onnx-tensorrt to 10.0-EA
yf711 Apr 2, 2024
fb4443f
fix
yf711 Apr 4, 2024
2420716
update skipped tests
yf711 Apr 8, 2024
af5851a
update skipped test
yf711 Apr 8, 2024
912c14c
test
yf711 Apr 8, 2024
52eba2a
test
yf711 Apr 8, 2024
d243fb1
test
yf711 Apr 9, 2024
fb4d491
test
yf711 Apr 10, 2024
ea62afa
test ubi8-cuda12.4
yf711 Apr 10, 2024
4077b29
test ubi8-cuda12.4
yf711 Apr 10, 2024
f863251
fix
yf711 Apr 10, 2024
e75de20
fix
yf711 Apr 10, 2024
ed2df37
test
yf711 Apr 10, 2024
d532715
test cuda11.8-trt10
yf711 Apr 10, 2024
4ba8b55
version correction
yf711 Apr 10, 2024
385c155
test
yf711 Apr 10, 2024
2a3dad2
test ep perf with cuda 12.4 dockerenv
yf711 Apr 10, 2024
f0392bf
fix
yf711 Apr 10, 2024
eca26ec
fix in yml
yf711 Apr 10, 2024
1536468
test filter
yf711 Apr 11, 2024
75eb7aa
nit
yf711 Apr 11, 2024
918cbd4
revert
yf711 Apr 12, 2024
7a3f44e
test
yf711 Apr 12, 2024
c20d295
Merge main into "yifanl/chi_trt10_cuda12"
yf711 Apr 12, 2024
9ecafcf
test
yf711 Apr 15, 2024
55e28e2
update
yf711 Apr 17, 2024
8187f5e
Merge branch 'yifanl/debug_resnet50' into yifanl/chi_trt10_cuda12
yf711 Apr 17, 2024
d8aa5c3
Merge branch 'main' into yifanl/chi_trt10_cuda12
yf711 Apr 17, 2024
e37ccf5
Merge branch 'main' into yifanl/chi_trt10_cuda12
yf711 Apr 18, 2024
7f3f16e
Fix to EP Perf when choosing OSS parser
yf711 Apr 18, 2024
639889b
enable dds ops with trt10ea
yf711 Apr 19, 2024
d51c9d6
fix on trtexec
yf711 Apr 19, 2024
100c9f9
Compile trtexec only if not installed
yf711 Apr 19, 2024
ada40d4
merge main
chilo-ms Apr 19, 2024
1638e94
TensorRT EP: Weightless API integration in ONNX Runtime (#20214)
moraxu Apr 22, 2024
1d5648a
Merge branch 'main' into yifanl/chi_trt10+dockerfile
chilo-ms Apr 22, 2024
01d5835
update onnx
chilo-ms Apr 22, 2024
c5980eb
win-trt10ga
yf711 Apr 24, 2024
5ec8203
onnx-tensorrt 10.0-GA
yf711 Apr 26, 2024
00b5e35
Merge branch 'yifanl/chi_trt10_cuda12' into yifanl/chi_trt10+dockerfile
yf711 Apr 26, 2024
2e92a7a
Revert "onnx-tensorrt 10.0-GA"
yf711 Apr 26, 2024
e0a122b
revert
yf711 Apr 26, 2024
1567d86
update
yf711 Apr 26, 2024
f145d79
10.0-GA
yf711 Apr 26, 2024
af2694f
revert ubi8 dockerfile
yf711 Apr 26, 2024
077e98a
fix compile error
chilo-ms Apr 29, 2024
9968c84
TRT10 GA
yf711 Apr 27, 2024
eec1942
filter tests
yf711 Apr 27, 2024
33c4136
update for GetTensorrtLogger
chilo-ms Apr 29, 2024
6841dce
Merge branch 'main' into yifanl/chi_trt10+dockerfile
yf711 May 1, 2024
bab40ac
Merge main
yf711 May 1, 2024
2bc5b87
revert
yf711 May 1, 2024
9d88daf
lintrunner -a
chilo-ms May 1, 2024
bf8e6bc
change naming from weightless to weight-stripped
chilo-ms May 23, 2024
b711ed7
Merge branch 'main' into yifanl/chi_trt10+dockerfile
chilo-ms May 23, 2024
cd5eba2
rename cache for weight-stripped engine
chilo-ms May 23, 2024
89c8b0f
serialize refitted engine
chilo-ms May 23, 2024
fba65ae
engine refit for non quick load path as well
chilo-ms May 24, 2024
102addf
remove commented code
chilo-ms May 24, 2024
421e59c
lintrunner -a
chilo-ms May 24, 2024
d666ce7
minor update
chilo-ms May 24, 2024
70fc577
add more comments
chilo-ms May 24, 2024
a5d2085
update and modify per reviewer's comment
chilo-ms May 24, 2024
b6c275d
update contrib op doc
chilo-ms May 24, 2024
38dafc1
refactor
chilo-ms May 24, 2024
ddc2ec8
modify contrib op doc
chilo-ms May 25, 2024
39526f1
code refactor
chilo-ms May 25, 2024
db2fce6
fix format
jywu-msft May 25, 2024
a8b9662
Check weight-stripped engine cache automatically in the case EPContex…
chilo-ms May 25, 2024
14765fa
add some verbose logging
jywu-msft May 26, 2024
aad9f86
Add comments and change function name per reviewer's comment
chilo-ms May 26, 2024
8284c8c
fix compiler error
chilo-ms May 26, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
10 changes: 10 additions & 0 deletions onnxruntime/core/providers/tensorrt/onnx_ctx_model_helper.cc
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,7 @@
#include "onnx_ctx_model_helper.h"
#include "core/providers/cuda/shared_inc/cuda_call.h"
#include "core/framework/execution_provider.h"
#include "tensorrt_execution_provider.h"

Check warning on line 11 in onnxruntime/core/providers/tensorrt/onnx_ctx_model_helper.cc

View workflow job for this annotation

GitHub Actions / Lint C++

[cpplint] reported by reviewdog 🐶 Include the directory when naming header files [build/include_subdir] [4] Raw Output: onnxruntime/core/providers/tensorrt/onnx_ctx_model_helper.cc:11: Include the directory when naming header files [build/include_subdir] [4]

namespace onnxruntime {
extern TensorrtLogger& GetTensorrtLogger(bool verbose_log);
Expand Down Expand Up @@ -252,6 +252,11 @@
return refitted_engine_cache_path;
}

bool IsWeightStrippedEngineCache(std::filesystem::path& engine_cache_path) {
// The weight-stripped engine cache has the naming of xxx.stripped.engine
return engine_cache_path.stem().extension().string() == ".stripped";
}

Status TensorRTCacheModelHandler::GetEpContextFromGraph(const GraphViewer& graph_viewer) {
if (!ValidateEPCtxNode(graph_viewer)) {
return ORT_MAKE_STATUS(ONNXRUNTIME, EP_FAIL, "It's not a valid EP Context node");
Expand Down Expand Up @@ -288,6 +293,11 @@
std::filesystem::path ctx_model_dir(GetPathOrParentPathOfCtxModel(ep_context_model_path_));
auto engine_cache_path = ctx_model_dir.append(cache_path);

// If it's a weight-stripped engine cache, it needs to be refitted even though the refit flag is not enabled
if (!weight_stripped_engine_refit_) {
weight_stripped_engine_refit_ = IsWeightStrippedEngineCache(engine_cache_path);
jywu-msft marked this conversation as resolved.
Show resolved Hide resolved
}

// If the serialized refitted engine is present, use it directly without refitting the engine again
if (weight_stripped_engine_refit_) {
const std::filesystem::path refitted_engine_cache_path = GetRefittedEnginePath(engine_cache_path.string());
Expand Down
Loading