Support trt cuda graph. #53406

jiweibo · 2023-04-27T08:19:40Z

PR types

Others

PR changes

Others

Description

背景：部分模型全图进入trt后，其enqueue耗时很长，采用cudaGraph的方式降低其enqueue耗时。
实现上主要参考了trtexec，具体集成流程如下：

EnableTensorRtEngine接口新增了bool use_cuda_graph选项，默认为false。（Log提示用户如果开启cudaGraph，保证输入shape不变）；
Pass阶段判断是否全图进入trt，仅当全图进入trt的时候才支持启用此功能；
trt engine构建阶段，无需任何修改；
run阶段，增加试运行步骤，如果成功（trt下也不是全部算子支持cudaGraph，存在失败可能），则后续run都走cudaGraph逻辑，否则按原逻辑执行；

TODO：

验证开启cudaGraph后，是否与memory sharing冲突。
~~代码中添加cuda版本限制~~，cuda10.0引入的该feature，发版最低cuda10.2，不需要通过宏来控制代码的编译

拿两个典型模型查看测试数据：

模型	enqueue耗时	运行耗时
pnc1+no cudagraph	12.88ms	14.64ms
pnc1+cudagraph	0.12ms	14.44ms
某视觉模型+no cudagraph	0.29ms	6.50ms
某视觉模型+cudagraph	0.04ms	6.50ms

cudaGraph可降低enqueue耗时，增强接口异步性能，但几乎不影响kernel实际执行耗时；

… idg_test

paddle-bot · 2023-04-27T08:19:45Z

你的PR提交成功，感谢你对开源项目的贡献!
请关注后续CI自动化测试结果，详情请参考Paddle-CI手册。
Your PR has been submitted. Thanks for your contribution!
Please wait for the result of CI firstly. See Paddle CI Manual for details.

paddle-bot · 2023-04-27T08:19:47Z

❌ The PR is not created using PR's template. You can refer to this Demo.
Please use PR's template, it helps save our maintainers' time so that more developers get helped.

qingqing01

Need to add UT.
If set use_cuda_graph, is the enqueue latency reduced? Give the testing time.
Add docs to explain when to use it

qingqing01 · 2023-05-05T03:35:06Z

paddle/fluid/inference/analysis/ir_passes/tensorrt_subgraph_pass.cc

+  if (use_cuda_graph && !all_nodes_offload_to_trt) {
+    LOG_FIRST_N(WARNING, 1)
+        << "You have enabled CudaGraph, but not the entire graph offload to "
+           "trt, now return to normal mode.";


whether need to set use_cuda_graph false here.

… idg_test

paddle-ci-bot · 2023-05-06T03:27:30Z

Sorry to inform you that d296778's CIs have passed for more than 7 days. To prevent PR conflicts, you need to re-run all CIs manually.

… idg_test

zhangjun · 2023-05-08T02:40:42Z

有测试数据吗这里存在疑问: 全图只有一次enqueue函数调用，相当于只有一次enqueue kernel发射，这与cudaGraph优化多组kernel发射开销场景不符

jiweibo · 2023-05-08T06:34:55Z

有测试数据吗这里存在疑问: 全图只有一次enqueue函数调用，相当于只有一次enqueue kernel发射，这与cudaGraph优化多组kernel发射开销场景不符

已补充

jiweibo · 2023-05-08T13:09:56Z

Need to add UT.

Done.

If set use_cuda_graph, is the enqueue latency reduced? Give the testing time.

Done, listed in description.

Add docs to explain when to use it

Add comments in paddle_analysis_config.h.

zhangjun · 2023-05-09T08:16:34Z

paddle/fluid/inference/tensorrt/engine.cc

+bool TensorRTEngine::Enqueue(nvinfer1::IExecutionContext *context,
+                             std::vector<void *> *buffers,
+                             int batch_size,
+                             cudaStream_t stream) {
+  if (cudagraph_inited_) {
+    VLOG(1) << "cuda_graph init success, so we will use cuda graph launch the "
+               "entire graph.";
+    return cuda_graph_.Launch(stream);
+  }


TODO: shape检查与报错

zhangjun

LGTM

jiweibo added 4 commits April 26, 2023 20:12

revert pr PaddlePaddle#46779

5a1b07c

Merge branch 'develop' of https://github.com/PaddlePaddle/Paddle into…

8e9efed

… idg_test

Support trt cuda graph when the whole graph offload to trt.

b3966d9

add warning log

c39e87a

add ut

d296778

jiweibo requested review from zhangjun and yuanlehome May 5, 2023 02:45

qingqing01 reviewed May 5, 2023

View reviewed changes

jiweibo added 2 commits May 6, 2023 02:42

Merge branch 'develop' of https://github.com/PaddlePaddle/Paddle into…

64774de

… idg_test

Merge branch 'develop' of https://github.com/PaddlePaddle/Paddle into…

1caa067

… idg_test

jiweibo added 2 commits May 6, 2023 17:16

update the doc for EnableTensorRTEngine.

ba55471

Merge branch 'develop' of https://github.com/PaddlePaddle/Paddle into…

15a44fc

… idg_test

update.

04bca16

zhangjun reviewed May 9, 2023

View reviewed changes

zhangjun approved these changes May 9, 2023

View reviewed changes

jiweibo merged commit ea0abf9 into PaddlePaddle:develop May 9, 2023

jiweibo deleted the idg_test branch May 9, 2023 08:20

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support trt cuda graph. #53406

Support trt cuda graph. #53406

jiweibo commented Apr 27, 2023 •

edited

Loading

paddle-bot bot commented Apr 27, 2023

paddle-bot bot commented Apr 27, 2023

qingqing01 left a comment •

edited

Loading

qingqing01 May 5, 2023

jiweibo May 8, 2023

paddle-ci-bot bot commented May 6, 2023

zhangjun commented May 8, 2023 •

edited

Loading

jiweibo commented May 8, 2023

jiweibo commented May 8, 2023 •

edited

Loading

zhangjun May 9, 2023

zhangjun left a comment

Support trt cuda graph. #53406

Support trt cuda graph. #53406

Conversation

jiweibo commented Apr 27, 2023 • edited Loading

PR types

PR changes

Description

paddle-bot bot commented Apr 27, 2023

paddle-bot bot commented Apr 27, 2023

qingqing01 left a comment • edited Loading

Choose a reason for hiding this comment

qingqing01 May 5, 2023

Choose a reason for hiding this comment

jiweibo May 8, 2023

Choose a reason for hiding this comment

paddle-ci-bot bot commented May 6, 2023

zhangjun commented May 8, 2023 • edited Loading

jiweibo commented May 8, 2023

jiweibo commented May 8, 2023 • edited Loading

zhangjun May 9, 2023

Choose a reason for hiding this comment

zhangjun left a comment

Choose a reason for hiding this comment

jiweibo commented Apr 27, 2023 •

edited

Loading

qingqing01 left a comment •

edited

Loading

zhangjun commented May 8, 2023 •

edited

Loading

jiweibo commented May 8, 2023 •

edited

Loading