Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

【PaddlePaddle Hackathon 4 No.40】为 Paddle 优化 kthvalue op 在 GPU 上的计算性能 #51835

Merged
merged 22 commits into from
Mar 24, 2023

Conversation

thunder95
Copy link
Contributor

@thunder95 thunder95 commented Mar 19, 2023

PR types

Performance optimization

PR changes

OPs

Describe

目前 Paddle 内 kthvalue 算子 GPU 计算采用了cub库实现,性能仍有明显的提升空间。
设计文档: PaddlePaddle/community#452

  • 开发环境:
  1. 设备:RTX 2070s
  2. 环境:CUDA10.2,cuDNN 7
  • 优化方法
    通过使用飞桨内部已经实现的RadixSearch,优化现在的cub方式的排序计算。

完成优化后,Paddle与优化前的Paddle的性能对比效果:

Case No. device input_shape input_type k Paddle Perf(ms) old_paddle Perf(ms) diff
1 RTX 2070s [16L, 10000L] float32 5 0.066955 0.29134 faster than 335.12%
2 RTX 2070s [16L, 3000L] float32 1 0.029379 0.13398 faster than 356.04%
3 RTX 2070s [16L, 10000L] float16 5 0.054159 0.1502 faster than 177.33%
4 RTX 2070s [16L, 3000L] float16 1 0.019049 0.06901 faster than 262.28%

完成优化后,Paddle与Pytorch的性能对比效果如下:

Case No. device input_shape input_type k Paddle Perf(ms) Pytorch Perf(ms) diff
1 RTX 2070s [16L, 10000L] float32 5 0.066955 0.08037 faster than 20.04%
2 RTX 2070s [16L, 3000L] float32 1 0.029379 0.041758 faster than 42.14%
3 RTX 2070s [16L, 10000L] float16 5 0.054159 0.070236 faster than 29.68%
4 RTX 2070s [16L, 3000L] float16 1 0.019049 0.027326 faster than 43.45%

针对四种不同case, 优化后的性能有不同程度的提升。

@thunder95
Copy link
Contributor Author

@luotao1 @JamesLim-sy 麻烦老师review以下

JamesLim-sy
JamesLim-sy previously approved these changes Mar 22, 2023
Copy link
Contributor

@JamesLim-sy JamesLim-sy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@luotao1
Copy link
Contributor

luotao1 commented Mar 22, 2023

image

需要解决下ROCM流水线的编译问题

Copy link
Contributor

@JamesLim-sy JamesLim-sy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, good work.

@JamesLim-sy JamesLim-sy merged commit e18f533 into PaddlePaddle:develop Mar 24, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
contributor External developers
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants