-
Notifications
You must be signed in to change notification settings - Fork 3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[intel_hpu] initial commit for intel_hpu support #9273
[intel_hpu] initial commit for intel_hpu support #9273
Conversation
…t_intel_hpu_backend
Thanks for your contribution! |
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## develop #9273 +/- ##
===========================================
+ Coverage 52.81% 52.91% +0.09%
===========================================
Files 673 673
Lines 107657 107687 +30
===========================================
+ Hits 56857 56980 +123
+ Misses 50800 50707 -93 ☔ View full report in Codecov by Sentry. |
bcaf5f3
to
d9a3ad6
Compare
可以新建一个 llm/intel_hpu 放对应的运行示例和文档。 |
@@ -248,7 +248,11 @@ def scaled_dot_product_attention( | |||
value_states = paddle.transpose(value_states, [0, 2, 1, 3]) | |||
|
|||
# matmul and devide by sqrt(head_dim) | |||
attn_weights = paddle.matmul(query_states / math.sqrt(head_dim), key_states.transpose([0, 1, 3, 2])) | |||
if get_env_device() == "intel_hpu": | |||
attn_weights = paddle.matmul(query_states * (1 / math.sqrt(head_dim)), key_states.transpose([0, 1, 3, 2])) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
这里的写法最好增加注释,说明为什么需要 1 / math.sqrt(head_dim
。
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
对立即数的乘法性能要好于除法
if config.context_parallel_degree > 1: | ||
raise ValueError("Context parallel is not implemented for intel_hpu") | ||
scaling_factor = query_states.shape[3] ** -0.5 | ||
attention_mask = attention_mask.astype("bfloat16") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
intel_hpu是否支持float16?facebook/llama-7b的数据类型是float16,llama2和llama3是bfloat16
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
这里改成了q的dtype
可以参考 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
PR types
New features
PR changes
Models
PR Category
Custom Device
Description
add intel_hpu device in PaddleNLP for fused RoPE, fused RMS, fused SDPA, dtype specific support