-
Notifications
You must be signed in to change notification settings - Fork 5.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add basic support for CUDA Graph #36190
Add basic support for CUDA Graph #36190
Conversation
Thanks for your contribution! |
e340a8f
to
d9af897
Compare
|
||
void RemoveMemoryPoolOfCUDAGraph(CUDAGraphID id) { | ||
auto iter = cuda_graph_allocator_map_.find(id); | ||
PADDLE_ENFORCE_EQ(iter != cuda_graph_allocator_map_.end(), true, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can use PADDLE_ENFORCE_NE directly
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
paddle/fluid/platform/cuda_graph.h
Outdated
|
||
public: | ||
explicit CUDAGraphCaptureModeGuard(cudaStreamCaptureMode new_mode) { | ||
old_mode_ = new_mode; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why old_mode_ = new_mode?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Change the variable name and add some comments for better understanding.
paddle/fluid/platform/cuda_graph.h
Outdated
} | ||
|
||
~CUDAGraphCaptureModeGuard() PADDLE_MAY_THROW { | ||
PADDLE_ENFORCE_CUDA_SUCCESS( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it ok to raise exception in the destructor?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes. Although it is not recommended to raise exception in the destructor, I think that we should not hide the exception. If exception is raised in the destructor, std::terminate
would be called to stop the process immediately.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes. Although it is not recommended to raise exception in the destructor, I think that we should not hide the exception. If exception is raised in the destructor,
std::terminate
would be called to stop the process immediately.
Maybe we need a PADDLE_WARN_CUDA_SUCCESS
, etc.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes. Although it is not recommended to raise exception in the destructor, I think that we should not hide the exception. If exception is raised in the destructor,
std::terminate
would be called to stop the process immediately.Maybe we need a
PADDLE_WARN_CUDA_SUCCESS
, etc.
Hard to do this. Suppose that we have a common method void func()
. The func
may be called anywhere, inside destructor or outside destructor, but we have to only write one of PADDLE_ENFORCE_CUDA_SUCCESS
and PADDLE_ENFORCE_WARN_SUCCESS
inside its implementation.
@@ -557,6 +558,7 @@ class RecordedCudaMallocHelper { | |||
#ifdef PADDLE_WITH_HIP | |||
auto result = hipMalloc(ptr, size); | |||
#else | |||
CUDAGraphCaptureModeGuard capture_mode_guard{cudaStreamCaptureModeRelaxed}; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
imho, call this func when IsCUDAGraphCapturing is true.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
PR types
New features
PR changes
APIs
Describe
Add basic support for CUDA Graph, including:
cudaFree
.CUDAGraph.capture_begin/capture_end/replay/reset
. Notice that this API is in the experimental stage, and may be changed in the future.CUDAGraphCaptureModeGuard
to switch tocudaStreamCaptureModeRelaxed
to skip the unsupportedcudaMalloc
during capturing.Usage:
TODO: when using CUDA Graph,
FLAGS_sync_all_reduce
when using distributed training. TheFLAGS_sync_all_reduce
would callcudaStreamSynchronize
, which is not supported during capturing.