Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimize elementwise_add_grad op #32051

Merged

Conversation

thisjiang
Copy link
Contributor

PR types

Performance optimization

PR changes

OPs

Describe

起因

SimpleElemwiseAddGradCUDAKernel的实现为拷贝dout的值到dxdy中。然而,此处存在一处可优化点:即若doutdxdy复用了同一片地址,则无需多余拷贝。

优化方案

  1. dx_datadout_data相同且当dy_datadout_data不同时:只需要拷贝dout_datady_data即可。
  2. dx_datadout_data不同且当dy_datadout_data相同时:只需要拷贝dout_datadx_data即可。
  3. dx_datady_datadout_data都不同时:调用原SimpleElemwiseAddGradCUDAKernel
  4. dx_datady_datadout_data都相同时:什么都不用做。

问:为什么放在elementwise_add_op.cu这儿而不是放在ElementwiseAddGradKernel::Compute处?
答:因为会报错Tensor not alloc memory

优化结果

elementwise_add_grad op耗时 优化前 优化后
float16 147.551 us 80.672 us
float 244.958 us 159.391 us

@paddle-bot-old
Copy link

paddle-bot-old bot commented Apr 2, 2021

Thanks for your contribution!
Please wait for the result of CI firstly. See Paddle CI Manual for details.

Copy link
Contributor

@Xreki Xreki left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@Xreki Xreki merged commit 1e52f32 into PaddlePaddle:develop Apr 3, 2021
@thisjiang thisjiang deleted the optimize-elementwise_add_grad branch April 6, 2021 06:43
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants