Add basic support for CUDA Graph #36190

sneaxiy · 2021-09-28T11:45:36Z

PR types

New features

PR changes

APIs

Describe

Add basic support for CUDA Graph, including:

Memory pool for CUDA Graph capturing. CUDA Graph needs to cache all Tensor address without calling any cudaFree.
Basic APIs like CUDAGraph.capture_begin/capture_end/replay/reset. Notice that this API is in the experimental stage, and may be changed in the future.
Use CUDAGraphCaptureModeGuard to switch to cudaStreamCaptureModeRelaxed to skip the unsupported cudaMalloc during capturing.

Usage:

from paddle.device.cuda.graphs import CUDAGraph

input = ... # define the input tensor. The definition of the input tensor should be before the `CUDAGraph.capture_start()`.

graph = CUDAGraph()
graph.capture_start()
output = ... # do some GPU operations here
graph.capture_end()

for _ in range(BATCH_NUM):
    input.copy_(input_tensor, False) # input_tensor is the input data of the model, may be from DataLoader
    graph.replay()
    print(output)

graph.reset() # it is not required, but it is better to call this method to release cached memory asap

TODO: when using CUDA Graph,

Cache CuDNN descriptor. Otherwise, errors would raise during capturing.
Disable CuDNN exhaustive search. Otherwise, errors would raise during capturing.
Disable FLAGS_sync_all_reduce when using distributed training. The FLAGS_sync_all_reduce would call cudaStreamSynchronize, which is not supported during capturing.
Modify ParallelExecutor to support CUDA Graph.

CLAassistant · 2021-09-28T11:45:39Z

All committers have signed the CLA.

paddle-bot-old · 2021-09-28T11:45:40Z

Thanks for your contribution!
Please wait for the result of CI firstly. See Paddle CI Manual for details.

zhiqiu · 2021-09-29T02:58:38Z

paddle/fluid/memory/allocation/allocator_facade.cc

+
+  void RemoveMemoryPoolOfCUDAGraph(CUDAGraphID id) {
+    auto iter = cuda_graph_allocator_map_.find(id);
+    PADDLE_ENFORCE_EQ(iter != cuda_graph_allocator_map_.end(), true,


Can use PADDLE_ENFORCE_NE directly

zhiqiu · 2021-09-29T03:09:09Z

paddle/fluid/platform/cuda_graph.h

+
+ public:
+  explicit CUDAGraphCaptureModeGuard(cudaStreamCaptureMode new_mode) {
+    old_mode_ = new_mode;


why old_mode_ = new_mode?

Change the variable name and add some comments for better understanding.

zhiqiu · 2021-09-29T03:12:25Z

paddle/fluid/platform/cuda_graph.h

+  }
+
+  ~CUDAGraphCaptureModeGuard() PADDLE_MAY_THROW {
+    PADDLE_ENFORCE_CUDA_SUCCESS(


Is it ok to raise exception in the destructor?

Yes. Although it is not recommended to raise exception in the destructor, I think that we should not hide the exception. If exception is raised in the destructor, std::terminate would be called to stop the process immediately.

Yes. Although it is not recommended to raise exception in the destructor, I think that we should not hide the exception. If exception is raised in the destructor, std::terminate would be called to stop the process immediately.

Maybe we need a PADDLE_WARN_CUDA_SUCCESS, etc.

Yes. Although it is not recommended to raise exception in the destructor, I think that we should not hide the exception. If exception is raised in the destructor, std::terminate would be called to stop the process immediately.

Maybe we need a PADDLE_WARN_CUDA_SUCCESS, etc.

Hard to do this. Suppose that we have a common method void func(). The func may be called anywhere, inside destructor or outside destructor, but we have to only write one of PADDLE_ENFORCE_CUDA_SUCCESS and PADDLE_ENFORCE_WARN_SUCCESS inside its implementation.

zhiqiu · 2021-09-29T03:22:02Z

paddle/fluid/platform/gpu_info.cc

@@ -557,6 +558,7 @@ class RecordedCudaMallocHelper {
 #ifdef PADDLE_WITH_HIP
    auto result = hipMalloc(ptr, size);
 #else
+    CUDAGraphCaptureModeGuard capture_mode_guard{cudaStreamCaptureModeRelaxed};


imho, call this func when IsCUDAGraphCapturing is true.

zhiqiu

LGTM

add basic support for CUDA Graph

d9af897

sneaxiy force-pushed the add_cuda_graph_basic_support branch from e340a8f to d9af897 Compare September 28, 2021 11:46

sneaxiy added 2 commits September 28, 2021 12:05

fix ci compile error

f3caefc

fix LOG print, fix windows CI

b375c34

sneaxiy requested review from Xreki and zhiqiu September 29, 2021 00:20

zhiqiu reviewed Sep 29, 2021

View reviewed changes

sneaxiy added 4 commits September 29, 2021 03:52

follow comments and update

ec1d28f

small fix for default ctor

1fb1ef0

fix rocm compile error

469393c

fix CPU compile error

a893f57

PaddlePaddle locked and limited conversation to collaborators Sep 29, 2021

PaddlePaddle unlocked this conversation Sep 29, 2021

sneaxiy closed this Sep 29, 2021

sneaxiy reopened this Sep 29, 2021

zhiqiu approved these changes Sep 29, 2021

View reviewed changes

sneaxiy merged commit 21b93c3 into PaddlePaddle:develop Sep 29, 2021

sneaxiy deleted the add_cuda_graph_basic_support branch September 29, 2021 09:12

sneaxiy mentioned this pull request Nov 10, 2021

MLPerf Optimization for Release/2.2 #37109

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add basic support for CUDA Graph #36190

Add basic support for CUDA Graph #36190

sneaxiy commented Sep 28, 2021 •

edited by lelelelelez

Loading

CLAassistant commented Sep 28, 2021 •

edited

Loading

paddle-bot-old bot commented Sep 28, 2021

zhiqiu Sep 29, 2021

sneaxiy Sep 29, 2021

zhiqiu Sep 29, 2021

sneaxiy Sep 29, 2021

zhiqiu Sep 29, 2021

sneaxiy Sep 29, 2021

zhiqiu Sep 29, 2021

sneaxiy Sep 29, 2021

zhiqiu Sep 29, 2021

sneaxiy Sep 29, 2021

zhiqiu left a comment

Add basic support for CUDA Graph #36190

Add basic support for CUDA Graph #36190

Conversation

sneaxiy commented Sep 28, 2021 • edited by lelelelelez Loading

PR types

PR changes

Describe

CLAassistant commented Sep 28, 2021 • edited Loading

paddle-bot-old bot commented Sep 28, 2021

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

zhiqiu left a comment

Choose a reason for hiding this comment

sneaxiy commented Sep 28, 2021 •

edited by lelelelelez

Loading

CLAassistant commented Sep 28, 2021 •

edited

Loading