[ROCM] fix depthwise conv in ROCM, test=develop #32117

qili93 · 2021-04-07T05:28:34Z

PR types

Bug fixes

PR changes

OPs

Describe

Fix depthwise conv in ROCM, use MIOPEN by default for depthwise conv, as cuda kernel has threads and blocks limits.

Related PR #31998 #31836

paddle-bot-old · 2021-04-07T05:28:50Z

Thanks for your contribution!
Please wait for the result of CI firstly. See Paddle CI Manual for details.

zhangting2020 · 2021-04-07T09:04:02Z

paddle/fluid/operators/conv_cudnn_op.cu

+                   paddle::operators::CUDNNConvOpKernel<plat::float16>);
+REGISTER_OP_KERNEL(depthwise_conv2d_grad, CUDNN, plat::CUDAPlace,
+                   paddle::operators::CUDNNConvGradOpKernel<float>,
+                   paddle::operators::CUDNNConvGradOpKernel<plat::float16>);


这里应该也不用注册depthwise_conv，depthwise_conv/conv 在我们的op里使用的都是一套逻辑，所以注册的Kernel都是CUDNNConvOpKernel或者CUDNNConvGradOpKernel

这里需要注册depthwise_conv2d和depthwise_conv2d_grad的CUDNN的OP Kernel，我们的Python API中的OP type就是depthwise_conv2d，如果不指定就会抛depthwise_conv2d OP无法找到CUDNN kernel的错误。

需要注册cudnn的kernel，原先depthwise_conv默认使用cuda kernel

zhangting2020 · 2021-04-07T09:14:51Z

python/paddle/fluid/layers/nn.py

+    if (num_channels == groups and num_filters % num_channels == 0 and
+            core.is_compiled_with_rocm()):
+        l_type = 'depthwise_conv2d'
+


这里本意是希望在rocm上强制调用depthwise_conv的cuDNN的实现吗？如果是这样，应该是在这种条件下，设置use_cudnn=True（必要的话还需要给出warning，提示无论use_cudnn为何值，当前都使用的是cuDNN的conv）。否则当use_cudnn=false时，会选择调用的自研的depthwise_conv的CUDA Kernel。

这里的use_cudnn的default值是true, 所以不会显示修改use_cudnn的值，并且允许用户把use_cudnn值设置成为False，即使用正常的cuda kernel而不是cudnn来跑depthwise_conv2d, depthwise_conv2d在input不是特别大的情况下cuda kernel一般不出问题，后续会对depthwise_conv2d的cuda kernel针对ROCM平台的线程限制问题做修复。

zhangting2020 · 2021-04-07T09:17:38Z

python/paddle/nn/functional/conv.py

+    if (core.is_compiled_with_cuda() and get_flags("FLAGS_conv2d_disable_cudnn")
+        ["FLAGS_conv2d_disable_cudnn"]):
+        use_cudnn = False
+


同上，应该并不需要单独注册depthwise_conv的cuDNN kernel，只需要在core.is_compiled_with_rocm()时设置use_cudnn=True

这里的代码是与PR#31836相关，增加了一个flag用于关闭conv2d的cudn, 原因是ROCM上在fasterrcnn的模型下，conv2d的输入输出多变会导致MIOPEN性能下降非常厉害。因此增加这个开关用于关闭cudnn，直接使用cuda kernel，性能会稳定很多。

如果conv1d没有使用到可以不用修改，python/paddle/nn/layer/conv.py里面的Conv1D并没有设置use_cudnn，避免导致nn.Conv1D和nn.functional.conv2d的行为不一致

zhangting2020 · 2021-04-07T09:27:44Z

python/paddle/nn/functional/conv.py

+        if core.is_compiled_with_rocm():
+            use_cudnn = True
+        else:
+            use_cudnn = False


functional下的conv接口，原本会强制设置depthwise_conv使用CUDA实现（L568～L570，通过指定op_type为depthwise_conv2d)。如果是rocm上只用cuDNN，设置为l_type为‘conv2d’，use_cudnn = True即可。下面的修改可能也类似。

zhangting2020 · 2021-04-07T09:40:43Z

大概OpKernel的选择是这样的关系：

depthwise_conv还是普通的conv，cuDNN实现都在conv_cudnn_op.cu里面，op_type也都为conv2d，因为它们共用同一个OpKernel。当python接口中，use_cudnn=True，并且op_type为conv2d时，如果配置满足depthwise_conv的条件，最终就会调用depthwise_conv的cuDNN Kernel。
paddle还实现了CUDA Kernel，能看到分别注册了conv2d、depthwise_conv2d，OpKernel也是分别实现的。python接口需要设置use_cudnn=False，并且分别设置op_type为conv2d或者depthwise_conv2d才会调用对应的CUDA实现。

另外建议也让API负责人帮忙review下

wangxinxin08 · 2021-04-08T03:17:31Z

paddle/fluid/operators/conv_cudnn_op.cu

+                   paddle::operators::CUDNNConvOpKernel<plat::float16>);
+REGISTER_OP_KERNEL(depthwise_conv2d_grad, CUDNN, plat::CUDAPlace,
+                   paddle::operators::CUDNNConvGradOpKernel<float>,
+                   paddle::operators::CUDNNConvGradOpKernel<plat::float16>);


需要注册cudnn的kernel，原先depthwise_conv默认使用cuda kernel

wangxinxin08 · 2021-04-08T04:03:33Z

python/paddle/nn/functional/conv.py

+    if (core.is_compiled_with_cuda() and get_flags("FLAGS_conv2d_disable_cudnn")
+        ["FLAGS_conv2d_disable_cudnn"]):
+        use_cudnn = False
+


如果conv1d没有使用到可以不用修改，python/paddle/nn/layer/conv.py里面的Conv1D并没有设置use_cudnn，避免导致nn.Conv1D和nn.functional.conv2d的行为不一致

wangxinxin08 · 2021-04-08T04:04:30Z

python/paddle/nn/layer/conv.py

+            if core.is_compiled_with_rocm():
+                self._use_cudnn = True
+            else:
+                self._use_cudnn = False


建议与上面nn.functionla.conv2d的判断保持一致

qili93 requested review from zhangting2020 and zhiqiu April 7, 2021 05:36

zhangting2020 reviewed Apr 7, 2021

View reviewed changes

qili93 force-pushed the rocm_fix_depthconv branch from 8b19c32 to 4df9177 Compare April 8, 2021 03:32

wangxinxin08 reviewed Apr 8, 2021

View reviewed changes

qili93 force-pushed the rocm_fix_depthconv branch from 4df9177 to 8bb526e Compare April 8, 2021 06:08

wangxinxin08 approved these changes Apr 8, 2021

View reviewed changes

qili93 closed this Apr 9, 2021

qili93 force-pushed the rocm_fix_depthconv branch from 8bb526e to a881b4d Compare April 9, 2021 06:49

PaddlePaddle locked and limited conversation to collaborators Apr 9, 2021

PaddlePaddle unlocked this conversation Apr 9, 2021

qili93 mentioned this pull request Apr 9, 2021

[ROCM] fix depthwise conv in ROCM, test=develop #32170

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[ROCM] fix depthwise conv in ROCM, test=develop #32117

[ROCM] fix depthwise conv in ROCM, test=develop #32117

qili93 commented Apr 7, 2021 •

edited

Loading

paddle-bot-old bot commented Apr 7, 2021

zhangting2020 Apr 7, 2021

qili93 Apr 7, 2021

wangxinxin08 Apr 8, 2021

zhangting2020 Apr 7, 2021

qili93 Apr 7, 2021

zhangting2020 Apr 7, 2021

qili93 Apr 7, 2021

wangxinxin08 Apr 8, 2021

zhangting2020 Apr 7, 2021

zhangting2020 commented Apr 7, 2021 •

edited

Loading

wangxinxin08 Apr 8, 2021

wangxinxin08 Apr 8, 2021

wangxinxin08 Apr 8, 2021

qili93 Apr 8, 2021

[ROCM] fix depthwise conv in ROCM, test=develop #32117

[ROCM] fix depthwise conv in ROCM, test=develop #32117

Conversation

qili93 commented Apr 7, 2021 • edited Loading

PR types

PR changes

Describe

paddle-bot-old bot commented Apr 7, 2021

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

zhangting2020 commented Apr 7, 2021 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

qili93 commented Apr 7, 2021 •

edited

Loading

zhangting2020 commented Apr 7, 2021 •

edited

Loading