Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[intel_hpu] initial commit for intel_hpu support #9273

Merged
merged 15 commits into from
Oct 31, 2024

Conversation

yanfeich
Copy link
Contributor

PR types

New features

PR changes

Models

PR Category

Custom Device

Description

add intel_hpu device in PaddleNLP for fused RoPE, fused RMS, fused SDPA, dtype specific support

Copy link

paddle-bot bot commented Oct 15, 2024

Thanks for your contribution!

Copy link

codecov bot commented Oct 15, 2024

Codecov Report

Attention: Patch coverage is 20.00000% with 32 lines in your changes missing coverage. Please review.

Project coverage is 52.91%. Comparing base (ec25cb8) to head (687c0c3).
Report is 2 commits behind head on develop.

Files with missing lines Patch % Lines
paddlenlp/transformers/llama/fusion_ops.py 0.00% 21 Missing ⚠️
paddlenlp/transformers/llama/modeling.py 41.17% 10 Missing ⚠️
paddlenlp/utils/tools.py 50.00% 1 Missing ⚠️
Additional details and impacted files
@@             Coverage Diff             @@
##           develop    #9273      +/-   ##
===========================================
+ Coverage    52.81%   52.91%   +0.09%     
===========================================
  Files          673      673              
  Lines       107657   107687      +30     
===========================================
+ Hits         56857    56980     +123     
+ Misses       50800    50707      -93     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@yanfeich yanfeich force-pushed the support_intel_hpu_backend branch from bcaf5f3 to d9a3ad6 Compare October 21, 2024 10:06
@ZHUI
Copy link
Collaborator

ZHUI commented Oct 22, 2024

可以新建一个 llm/intel_hpu 放对应的运行示例和文档。

@@ -248,7 +248,11 @@ def scaled_dot_product_attention(
value_states = paddle.transpose(value_states, [0, 2, 1, 3])

# matmul and devide by sqrt(head_dim)
attn_weights = paddle.matmul(query_states / math.sqrt(head_dim), key_states.transpose([0, 1, 3, 2]))
if get_env_device() == "intel_hpu":
attn_weights = paddle.matmul(query_states * (1 / math.sqrt(head_dim)), key_states.transpose([0, 1, 3, 2]))
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这里的写法最好增加注释,说明为什么需要 1 / math.sqrt(head_dim

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

对立即数的乘法性能要好于除法

if config.context_parallel_degree > 1:
raise ValueError("Context parallel is not implemented for intel_hpu")
scaling_factor = query_states.shape[3] ** -0.5
attention_mask = attention_mask.astype("bfloat16")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

intel_hpu是否支持float16?facebook/llama-7b的数据类型是float16,llama2和llama3是bfloat16

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这里改成了q的dtype

@DrownFish19
Copy link
Collaborator

可以新建一个 llm/intel_hpu 放对应的运行示例和文档。

可以参考

Copy link
Collaborator

@DrownFish19 DrownFish19 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@ZHUI ZHUI merged commit ce083f0 into PaddlePaddle:develop Oct 31, 2024
8 of 11 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants