Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

【PaddlePaddle Hackathon 4 No.36】为 Paddle 优化 tile op 在 GPU 上的计算性能 #52482

Merged
merged 12 commits into from
Apr 10, 2023

Conversation

zeroRains
Copy link
Contributor

@zeroRains zeroRains commented Apr 3, 2023

PR types

Performance optimization

PR changes

OPs

Describe

目前Paddle中的Tile算子在GPU和CPU的计算逻辑相同,没有编写对应的Cuda代码,存在一定优化空间
设计文档:https://github.com/PaddlePaddle/community/blob/master/rfcs/OPs-Perf/20230319_tile_op_optimization.md

  • 开发环境

    1. 设备:Tesla V100
    2. 环境:CUDA11.2,cuDNN 8
  • 优化方法

    • 使用phi::funcs::BroadcastKernelkps::IdentityFunctor<T>()的组合方式,加速tile执行中的复制操作
      完成优化后,Paddle与优化前的Paddle的性能对比效果:
Case No. device repeat_times input_shape input_type Paddle Perf(ms) old Paddle Perf(ms) diff
1 Tesla V100 [1,10,128,128] [16L,100L,2L,2L] float32 5.1831 10.1888 faster than 96.58%
2 Tesla V100 [1,10,128,128] [16L,100L,2L,2L] float16 3.5461 16.7348 faster than 372%
3 Tesla V100 [4,1,807] [32L, 807L, 1L] float32 0.3885 0.7381 faster than 89.99%
4 Tesla V100 [4,1,807] [32L, 807L, 1L] float16 0.2465 0.9850 faster than 300%

完成优化后,Paddle与Pytorch的性能对比效果如下:

Case No. device repeat_times input_shape input_type Paddle Perf(ms) Pytorch Perf(ms) diff
1 Tesla V100 [1,10,128,128] [16L,100L,2L,2L] float32 5.1831 8.0796 faster than 55.88%
2 Tesla V100 [1,10,128,128] [16L,100L,2L,2L] float16 3.5461 7.7898 faster than 120%
3 Tesla V100 [4,1,807] [32L, 807L, 1L] float32 0.3885 0.5342 faster than 37.50%
4 Tesla V100 [4,1,807] [32L, 807L, 1L] float16 0.2465 0.3768 faster than 52.86%

针对四种不同case, 优化后的性能有不同程度的提升。
感谢 @AndPuQing @Asthestarsfalll 在我Debug时提供帮助。

@paddle-bot
Copy link

paddle-bot bot commented Apr 3, 2023

你的PR提交成功,感谢你对开源项目的贡献!
请关注后续CI自动化测试结果,详情请参考Paddle-CI手册
Your PR has been submitted. Thanks for your contribution!
Please wait for the result of CI firstly. See Paddle CI Manual for details.

@paddle-bot
Copy link

paddle-bot bot commented Apr 3, 2023

很抱歉,经过我们的反复讨论,你的PR暂未达到合入标准,请阅读飞桨原生算子开发规范,你可以重新提交新的PR,我们先将此PR关闭,感谢你的贡献。
Sorry to inform you that through our discussion, your PR fails to meet the merging standard (Reference: Paddle Custom Operator Design Doc). You can also submit an new one. Thank you.

@zeroRains
Copy link
Contributor Author

CI已过,麻烦老师reviewe一下 @JamesLim-sy

Copy link
Contributor

@JamesLim-sy JamesLim-sy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@JamesLim-sy JamesLim-sy merged commit 61fe219 into PaddlePaddle:develop Apr 10, 2023
@zeroRains zeroRains deleted the tile branch April 10, 2023 03:27
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
contributor External developers
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants