-
Notifications
You must be signed in to change notification settings - Fork 5.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[ROCM] fix depthwise conv in ROCM, test=develop #32117
Conversation
Thanks for your contribution! |
paddle::operators::CUDNNConvOpKernel<plat::float16>); | ||
REGISTER_OP_KERNEL(depthwise_conv2d_grad, CUDNN, plat::CUDAPlace, | ||
paddle::operators::CUDNNConvGradOpKernel<float>, | ||
paddle::operators::CUDNNConvGradOpKernel<plat::float16>); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
这里应该也不用注册depthwise_conv,depthwise_conv/conv 在我们的op里使用的都是一套逻辑,所以注册的Kernel都是CUDNNConvOpKernel或者CUDNNConvGradOpKernel
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
这里需要注册depthwise_conv2d和depthwise_conv2d_grad的CUDNN的OP Kernel,我们的Python API中的OP type就是depthwise_conv2d,如果不指定就会抛depthwise_conv2d OP无法找到CUDNN kernel的错误。
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
需要注册cudnn的kernel,原先depthwise_conv默认使用cuda kernel
python/paddle/fluid/layers/nn.py
Outdated
if (num_channels == groups and num_filters % num_channels == 0 and | ||
core.is_compiled_with_rocm()): | ||
l_type = 'depthwise_conv2d' | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
这里本意是希望在rocm上强制调用depthwise_conv的cuDNN的实现吗?如果是这样,应该是在这种条件下,设置use_cudnn=True(必要的话还需要给出warning,提示无论use_cudnn为何值,当前都使用的是cuDNN的conv)。否则当use_cudnn=false时,会选择调用的自研的depthwise_conv的CUDA Kernel。
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
这里的use_cudnn的default值是true, 所以不会显示修改use_cudnn的值,并且允许用户把use_cudnn值设置成为False,即使用正常的cuda kernel而不是cudnn来跑depthwise_conv2d, depthwise_conv2d在input不是特别大的情况下cuda kernel一般不出问题,后续会对depthwise_conv2d的cuda kernel针对ROCM平台的线程限制问题做修复。
python/paddle/nn/functional/conv.py
Outdated
if (core.is_compiled_with_cuda() and get_flags("FLAGS_conv2d_disable_cudnn") | ||
["FLAGS_conv2d_disable_cudnn"]): | ||
use_cudnn = False | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
同上,应该并不需要单独注册depthwise_conv的cuDNN kernel,只需要在core.is_compiled_with_rocm()时设置use_cudnn=True
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
这里的代码是与PR#31836相关,增加了一个flag用于关闭conv2d的cudn, 原因是ROCM上在fasterrcnn的模型下,conv2d的输入输出多变会导致MIOPEN性能下降非常厉害。因此增加这个开关用于关闭cudnn,直接使用cuda kernel,性能会稳定很多。
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
如果conv1d没有使用到可以不用修改,python/paddle/nn/layer/conv.py里面的Conv1D并没有设置use_cudnn,避免导致nn.Conv1D和nn.functional.conv2d的行为不一致
python/paddle/nn/functional/conv.py
Outdated
if core.is_compiled_with_rocm(): | ||
use_cudnn = True | ||
else: | ||
use_cudnn = False |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
functional下的conv接口,原本会强制设置depthwise_conv使用CUDA实现(L568~L570,通过指定op_type为depthwise_conv2d)。如果是rocm上只用cuDNN,设置为l_type为‘conv2d’,use_cudnn = True即可。下面的修改可能也类似。
大概OpKernel的选择是这样的关系:
另外建议也让API负责人帮忙review下 |
8b19c32
to
4df9177
Compare
paddle::operators::CUDNNConvOpKernel<plat::float16>); | ||
REGISTER_OP_KERNEL(depthwise_conv2d_grad, CUDNN, plat::CUDAPlace, | ||
paddle::operators::CUDNNConvGradOpKernel<float>, | ||
paddle::operators::CUDNNConvGradOpKernel<plat::float16>); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
需要注册cudnn的kernel,原先depthwise_conv默认使用cuda kernel
python/paddle/nn/functional/conv.py
Outdated
if (core.is_compiled_with_cuda() and get_flags("FLAGS_conv2d_disable_cudnn") | ||
["FLAGS_conv2d_disable_cudnn"]): | ||
use_cudnn = False | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
如果conv1d没有使用到可以不用修改,python/paddle/nn/layer/conv.py里面的Conv1D并没有设置use_cudnn,避免导致nn.Conv1D和nn.functional.conv2d的行为不一致
python/paddle/nn/layer/conv.py
Outdated
if core.is_compiled_with_rocm(): | ||
self._use_cudnn = True | ||
else: | ||
self._use_cudnn = False |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
建议与上面nn.functionla.conv2d的判断保持一致
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
4df9177
to
8bb526e
Compare
8bb526e
to
a881b4d
Compare
PR types
Bug fixes
PR changes
OPs
Describe
Fix depthwise conv in ROCM, use MIOPEN by default for depthwise conv, as cuda kernel has threads and blocks limits.
Related PR #31998 #31836