Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[LLM INFER] Fix some bugs and chatglm_v2 support block_attn #9271

Merged
merged 6 commits into from
Oct 25, 2024

Conversation

yuanlehome
Copy link
Collaborator

@yuanlehome yuanlehome commented Oct 15, 2024

PR types

New features

PR changes

Models

Description

  • chatglm_v2 support block_attn mode, but the accuracy needs to be aligned
  • 修复先前disable掉的诸多单测
  • 略微优化下组网代码
  • 添加USE_FASTER_TOP_P_SAMPLING环境变量用于使用性能更好的top_p_sampling算子

Copy link

paddle-bot bot commented Oct 15, 2024

Thanks for your contribution!

@yuanlehome yuanlehome marked this pull request as draft October 15, 2024 07:39
Copy link

codecov bot commented Oct 15, 2024

Codecov Report

Attention: Patch coverage is 0% with 104 lines in your changes missing coverage. Please review.

Project coverage is 52.89%. Comparing base (7551730) to head (d19ed92).
Report is 5 commits behind head on develop.

Files with missing lines Patch % Lines
...p/experimental/transformers/chatglm_v2/modeling.py 0.00% 84 Missing ⚠️
...erimental/transformers/fused_transformer_layers.py 0.00% 15 Missing ⚠️
...enlp/experimental/transformers/generation_utils.py 0.00% 5 Missing ⚠️
Additional details and impacted files
@@             Coverage Diff             @@
##           develop    #9271      +/-   ##
===========================================
+ Coverage    52.80%   52.89%   +0.08%     
===========================================
  Files          660      660              
  Lines       106869   106929      +60     
===========================================
+ Hits         56434    56561     +127     
+ Misses       50435    50368      -67     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@yuanlehome yuanlehome reopened this Oct 24, 2024
@yuanlehome yuanlehome marked this pull request as ready for review October 24, 2024 06:55
@yuanlehome yuanlehome changed the title [LLM INFER] chatglm_v2 support block_attn [LLM INFER] Fix some bugs and chatglm_v2 support block_attn Oct 24, 2024
else:
return 8192 # Maximum sequence length.
total_max_length: int = field(
default=4096, metadata={"help": "Super parameter. Maximum sequence length(encoder+decoder)."}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这个跟npu相关同学确认的吗?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

已确认,没问题。

@@ -520,7 +520,7 @@ def _preprocess(self, source):
alibi_slopes = llm_utils.get_alibi_slopes(self.model_config.n_head)
inputs["position_ids"] = paddle.to_tensor(alibi_slopes, dtype="float32")
arange_tensor_encoder = paddle.arange(self.config.total_max_length, dtype=self.config.dtype)
alibi = alibi_slopes[None, :, None, None] * arange_tensor_encoder
alibi = (alibi_slopes[None, :, None, None] * arange_tensor_encoder).astype(self.config.dtype)

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

emm,这个 config dtype保险吗?用户可以改这个值。要不用里面一个tensor的dtype。

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这个dtype确实需要与config.dtype保持一致的


model = Model.from_pretrained(
predictor_args.total_max_length = config.seq_length
if predictor_args.block_attn:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

emm,我建议吧 block_attn 放到config的属性里面,然后 ChatGLMv2InferenceModel 里面自己控制。
这里改的话,后期这样修改的模型太多了。

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

但严格来说其实这个不属于每个模型的Config,如果加入如LlamaConfig的话,每个模型的Config里都需要加,先保持这样吧,后面重构的时候,会看下有没有更好的方式

@qingqing01 qingqing01 merged commit 2e8b220 into PaddlePaddle:develop Oct 25, 2024
9 of 12 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants