Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

【Hackathon No.32】为 Paddle 优化 expand_as 前向&反向 op 在 GPU 上的计算性能 #52684

Closed
wants to merge 6 commits into from

Conversation

Timber-Ye
Copy link
Contributor

PR types

Performance optimization

PR changes

OPs

Describe

目前 Paddle 内 expand_as 前向和反向算子的 GPU 实现采用 Eigen 组合的模式,缺少 GPU Kernel,性能相对不足,希望实现高性能的 GPU 计算 Kernel,为 Paddle 优化 expand_as op 在 GPU 上的计算性能。

  • 开发环境
  1. 设备:Tesla V100-32G
  2. CUDA 11.2,cuDNN v8.1.1
  • 优化方法

【算子性能优化设计文档】

由于expand_as前向的过程与广播机制类似,后向的过程与求和约归类似,因此直接通过使用飞桨内部的 BroadcastKernelReduceKernel 来对expand_as算子进行优化。

  • 优化效果

完成优化后,Paddle(Optimized)与优化前的Paddle(Baseline)的性能对比:

Case Data type src_shape dst_shape Paddle Baseline(ms) Optimized(ms) Diff
0 float16 [1785, 1] [1785, 128] 0.1971 0.1166 faster than 40.835%
1 float16 [5, 1, 1] [5, 128, 128] 3.0909 0.1269 faster than 95.895%
2 float16 [32, 807, 1] [32, 807, 807] 1.3884 0.3940 faster than 71.620%
3 float32 [1785, 1] [1785, 128] 0.2244 0.1150 faster than 48.760%
4 float32 [5, 1, 1] [5, 128, 128] 3.6155 0.1179 faster than 96.738%
5 float32 [32, 807, 1] [32, 807, 807] 1.4826 0.6428 faster than 56.646%
6 float64 [32, 1, 1] [32, 807, 807] 288.7776 1.2293 faster than 99.570%
7 float64 [1, 1, 64 ,5] [64, 128, 64, 5] 3.1326 0.2746 faster than 91.645%
8 float64 [5, 1, 1, 1, 1] [5, 1, 713, 1, 889] 240.8861 0.2960 faster than 99.877%

针对以上9种不同的case, 优化后的性能有所提升,并且要扩展的Tensor元素数量越多,性能提升越明显,优化后的算子在case 5上的用时更是直接缩短至baseline的1/814。

@paddle-bot
Copy link

paddle-bot bot commented Apr 8, 2023

你的PR提交成功,感谢你对开源项目的贡献!
请关注后续CI自动化测试结果,详情请参考Paddle-CI手册
Your PR has been submitted. Thanks for your contribution!
Please wait for the result of CI firstly. See Paddle CI Manual for details.

@paddle-bot paddle-bot bot added contributor External developers status: proposed labels Apr 8, 2023
Co-authored-by: Timber-Ye <ye_hanqiao@163.com>
Co-authored-by: BrianQian1999 <brianqianhitsz@gmail.com>
@Timber-Ye Timber-Ye closed this Apr 8, 2023
@Timber-Ye Timber-Ye deleted the expand_as_perf branch April 8, 2023 14:27
@paddle-bot
Copy link

paddle-bot bot commented Apr 8, 2023

很抱歉,经过我们的反复讨论,你的PR暂未达到合入标准,请阅读飞桨原生算子开发规范,你可以重新提交新的PR,我们先将此PR关闭,感谢你的贡献。
Sorry to inform you that through our discussion, your PR fails to meet the merging standard (Reference: Paddle Custom Operator Design Doc). You can also submit an new one. Thank you.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
contributor External developers
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant