Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Suppport more scenes for fused_fast_ln #42282

Merged
merged 2 commits into from
Apr 28, 2022
Merged

Conversation

ZzSean
Copy link
Contributor

@ZzSean ZzSean commented Apr 26, 2022

PR types

Others

PR changes

OPs

Describe

Suppport more scenes for fused_fast_ln
fused_fast_ln_kernel now can support the scenes bellowing:

  • the number of column is 768 or 4096;
  • the bias_ptr is not nullptr

Performance promotion:

  • Fused OP:
config old new speedup
bsz=1, in_seq_len=128, max_dec_len=8 10.01 7.45 1.34x
bsz=4, in_seq_len=128, max_dec_len=8 11.30 8.21 1.38x
bsz=1, in_seq_len=60, max_dec_len=20 22.27 17.07 1.30x
bsz=4, in_seq_len=60, max_dec_len=20 23.05 17.54 1.31x
  • Model:The average performance of Transformer model is improved by 1%~1.5%

@paddle-bot-old
Copy link

你的PR提交成功,感谢你对开源项目的贡献!
请关注后续CI自动化测试结果,详情请参考Paddle-CI手册
Your PR has been submitted. Thanks for your contribution!
Please wait for the result of CI firstly. See Paddle CI Manual for details.

Copy link
Contributor

@limin2021 limin2021 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. 在PR描述里补充当列数为768和4096时该op的性能提升情况。
  2. 该fuse kernel的寄存器使用量很多,会不会有溢出风险?

@ZzSean
Copy link
Contributor Author

ZzSean commented Apr 26, 2022

  1. 在PR描述里补充当列数为768和4096时该op的性能提升情况。
  2. 该fuse kernel的寄存器使用量很多,会不会有溢出风险?

寄存器使用情况相较于修改前最多会多出Vec bias[LDGS],LDGS最大也是4,且block中的线程数也是128,因此应该不会有溢出风险

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants