Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[ROCM] fix depthwise conv in ROCM, test=develop #32117

Closed
wants to merge 0 commits into from

Conversation

qili93
Copy link
Contributor

@qili93 qili93 commented Apr 7, 2021

PR types

Bug fixes

PR changes

OPs

Describe

Fix depthwise conv in ROCM, use MIOPEN by default for depthwise conv, as cuda kernel has threads and blocks limits.

Related PR #31998 #31836

@paddle-bot-old
Copy link

paddle-bot-old bot commented Apr 7, 2021

Thanks for your contribution!
Please wait for the result of CI firstly. See Paddle CI Manual for details.

paddle::operators::CUDNNConvOpKernel<plat::float16>);
REGISTER_OP_KERNEL(depthwise_conv2d_grad, CUDNN, plat::CUDAPlace,
paddle::operators::CUDNNConvGradOpKernel<float>,
paddle::operators::CUDNNConvGradOpKernel<plat::float16>);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这里应该也不用注册depthwise_conv,depthwise_conv/conv 在我们的op里使用的都是一套逻辑,所以注册的Kernel都是CUDNNConvOpKernel或者CUDNNConvGradOpKernel

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这里需要注册depthwise_conv2d和depthwise_conv2d_grad的CUDNN的OP Kernel,我们的Python API中的OP type就是depthwise_conv2d,如果不指定就会抛depthwise_conv2d OP无法找到CUDNN kernel的错误。

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

需要注册cudnn的kernel,原先depthwise_conv默认使用cuda kernel

if (num_channels == groups and num_filters % num_channels == 0 and
core.is_compiled_with_rocm()):
l_type = 'depthwise_conv2d'

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这里本意是希望在rocm上强制调用depthwise_conv的cuDNN的实现吗?如果是这样,应该是在这种条件下,设置use_cudnn=True(必要的话还需要给出warning,提示无论use_cudnn为何值,当前都使用的是cuDNN的conv)。否则当use_cudnn=false时,会选择调用的自研的depthwise_conv的CUDA Kernel。

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这里的use_cudnn的default值是true, 所以不会显示修改use_cudnn的值,并且允许用户把use_cudnn值设置成为False,即使用正常的cuda kernel而不是cudnn来跑depthwise_conv2d, depthwise_conv2d在input不是特别大的情况下cuda kernel一般不出问题,后续会对depthwise_conv2d的cuda kernel针对ROCM平台的线程限制问题做修复。

if (core.is_compiled_with_cuda() and get_flags("FLAGS_conv2d_disable_cudnn")
["FLAGS_conv2d_disable_cudnn"]):
use_cudnn = False

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

同上,应该并不需要单独注册depthwise_conv的cuDNN kernel,只需要在core.is_compiled_with_rocm()时设置use_cudnn=True

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这里的代码是与PR#31836相关,增加了一个flag用于关闭conv2d的cudn, 原因是ROCM上在fasterrcnn的模型下,conv2d的输入输出多变会导致MIOPEN性能下降非常厉害。因此增加这个开关用于关闭cudnn,直接使用cuda kernel,性能会稳定很多。

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

如果conv1d没有使用到可以不用修改,python/paddle/nn/layer/conv.py里面的Conv1D并没有设置use_cudnn,避免导致nn.Conv1D和nn.functional.conv2d的行为不一致

if core.is_compiled_with_rocm():
use_cudnn = True
else:
use_cudnn = False
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

functional下的conv接口,原本会强制设置depthwise_conv使用CUDA实现(L568~L570,通过指定op_type为depthwise_conv2d)。如果是rocm上只用cuDNN,设置为l_type为‘conv2d’,use_cudnn = True即可。下面的修改可能也类似。

@zhangting2020
Copy link
Contributor

zhangting2020 commented Apr 7, 2021

大概OpKernel的选择是这样的关系:

  • depthwise_conv还是普通的conv,cuDNN实现都在conv_cudnn_op.cu里面,op_type也都为conv2d,因为它们共用同一个OpKernel。当python接口中,use_cudnn=True,并且op_type为conv2d时,如果配置满足depthwise_conv的条件,最终就会调用depthwise_conv的cuDNN Kernel。
  • paddle还实现了CUDA Kernel,能看到分别注册了conv2d、depthwise_conv2d,OpKernel也是分别实现的。python接口需要设置use_cudnn=False,并且分别设置op_type为conv2d或者depthwise_conv2d才会调用对应的CUDA实现。

另外建议也让API负责人帮忙review下

paddle::operators::CUDNNConvOpKernel<plat::float16>);
REGISTER_OP_KERNEL(depthwise_conv2d_grad, CUDNN, plat::CUDAPlace,
paddle::operators::CUDNNConvGradOpKernel<float>,
paddle::operators::CUDNNConvGradOpKernel<plat::float16>);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

需要注册cudnn的kernel,原先depthwise_conv默认使用cuda kernel

if (core.is_compiled_with_cuda() and get_flags("FLAGS_conv2d_disable_cudnn")
["FLAGS_conv2d_disable_cudnn"]):
use_cudnn = False

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

如果conv1d没有使用到可以不用修改,python/paddle/nn/layer/conv.py里面的Conv1D并没有设置use_cudnn,避免导致nn.Conv1D和nn.functional.conv2d的行为不一致

if core.is_compiled_with_rocm():
self._use_cudnn = True
else:
self._use_cudnn = False
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

建议与上面nn.functionla.conv2d的判断保持一致

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

@qili93 qili93 closed this Apr 9, 2021
@PaddlePaddle PaddlePaddle locked and limited conversation to collaborators Apr 9, 2021
@PaddlePaddle PaddlePaddle unlocked this conversation Apr 9, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants