-
Notifications
You must be signed in to change notification settings - Fork 5.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[bf16] add bf16 kernel: layer_norm p_norm reduce_sum #39843
Conversation
Thanks for your contribution! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM for op benchmark
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
PR types
New features
PR changes
OPs
Describe
添加
layer_norm
p_norm
reduce_sum
bf16 kernel.对LayerNorm做性能测试,在embed_dim=1024的场景下分析call_1024_kernel在bf16数据类型下的计算性能:(call_1024_kernel的优化策略见PR39247)
layer_norm前向及反向耗时:
cost_time fp32: 0.01763439178466797s
cost_time bf16 use 1024 kernel: 0.007885456085205078s
cost_time bf16 no use 1024 kernel: 0.008244991302490234s