【Hackathon No.32】为 Paddle 优化 expand_as 前向&反向 op 在 GPU 上的计算性能 #52684
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
PR types
Performance optimization
PR changes
OPs
Describe
目前 Paddle 内
expand_as
前向和反向算子的 GPU 实现采用 Eigen 组合的模式,缺少 GPU Kernel,性能相对不足,希望实现高性能的 GPU 计算 Kernel,为 Paddle 优化expand_as
op 在 GPU 上的计算性能。【算子性能优化设计文档】
由于expand_as前向的过程与广播机制类似,后向的过程与求和约归类似,因此直接通过使用飞桨内部的
BroadcastKernel
和ReduceKernel
来对expand_as算子进行优化。完成优化后,Paddle(Optimized)与优化前的Paddle(Baseline)的性能对比:
针对以上9种不同的case, 优化后的性能有所提升,并且要扩展的Tensor元素数量越多,性能提升越明显,优化后的算子在case 5上的用时更是直接缩短至baseline的1/814。