[LLM INFER] Fix some bugs and chatglm_v2 support block_attn #9271

yuanlehome · 2024-10-15T07:27:43Z

PR types

New features

PR changes

Models

Description

chatglm_v2 support block_attn mode, but the accuracy needs to be aligned
修复先前disable掉的诸多单测
略微优化下组网代码
添加USE_FASTER_TOP_P_SAMPLING环境变量用于使用性能更好的top_p_sampling算子

paddle-bot · 2024-10-15T07:27:48Z

Thanks for your contribution!

codecov · 2024-10-15T07:59:44Z

Codecov Report

Attention: Patch coverage is 0% with 104 lines in your changes missing coverage. Please review.

Project coverage is 52.89%. Comparing base (7551730) to head (d19ed92).
Report is 5 commits behind head on develop.

Files with missing lines	Patch %	Lines
...p/experimental/transformers/chatglm_v2/modeling.py	0.00%	84 Missing ⚠️
...erimental/transformers/fused_transformer_layers.py	0.00%	15 Missing ⚠️
...enlp/experimental/transformers/generation_utils.py	0.00%	5 Missing ⚠️

Additional details and impacted files

@@             Coverage Diff             @@
##           develop    #9271      +/-   ##
===========================================
+ Coverage    52.80%   52.89%   +0.08%     
===========================================
  Files          660      660              
  Lines       106869   106929      +60     
===========================================
+ Hits         56434    56561     +127     
+ Misses       50435    50368      -67

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

ZHUI · 2024-10-25T06:24:45Z

llm/predict/predictor.py

-        else:
-            return 8192  # Maximum sequence length.
+    total_max_length: int = field(
+        default=4096, metadata={"help": "Super parameter. Maximum sequence length(encoder+decoder)."}


这个跟npu相关同学确认的吗?

已确认，没问题。

ZHUI · 2024-10-25T06:25:57Z

llm/predict/predictor.py

@@ -520,7 +520,7 @@ def _preprocess(self, source):
            alibi_slopes = llm_utils.get_alibi_slopes(self.model_config.n_head)
            inputs["position_ids"] = paddle.to_tensor(alibi_slopes, dtype="float32")
            arange_tensor_encoder = paddle.arange(self.config.total_max_length, dtype=self.config.dtype)
-            alibi = alibi_slopes[None, :, None, None] * arange_tensor_encoder
+            alibi = (alibi_slopes[None, :, None, None] * arange_tensor_encoder).astype(self.config.dtype)



emm,这个 config dtype保险吗？用户可以改这个值。要不用里面一个tensor的dtype。

这个dtype确实需要与config.dtype保持一致的

ZHUI · 2024-10-25T06:28:04Z

llm/predict/predictor.py

-
-                model = Model.from_pretrained(
+                predictor_args.total_max_length = config.seq_length
+                if predictor_args.block_attn:


emm，我建议吧 block_attn 放到config的属性里面，然后 ChatGLMv2InferenceModel 里面自己控制。
这里改的话，后期这样修改的模型太多了。

但严格来说其实这个不属于每个模型的Config，如果加入如LlamaConfig的话，每个模型的Config里都需要加，先保持这样吧，后面重构的时候，会看下有没有更好的方式

yuanlehome marked this pull request as draft October 15, 2024 07:39

yuanlehome closed this Oct 24, 2024

yuanlehome force-pushed the test_chatglm2 branch from f3b2d99 to 7551730 Compare October 24, 2024 06:54

chatglm2 support block_attn and fix some bugs

c19166a

yuanlehome reopened this Oct 24, 2024

yuanlehome marked this pull request as ready for review October 24, 2024 06:55

yuanlehome added 5 commits October 24, 2024 07:14

fix ci

2d2c5bd

update

57773ef

fix more ut error

00c9819

update

e763f17

update

d19ed92

yuanlehome changed the title ~~[LLM INFER] chatglm_v2 support block_attn~~ [LLM INFER] Fix some bugs and chatglm_v2 support block_attn Oct 24, 2024

qingqing01 approved these changes Oct 25, 2024

View reviewed changes

ZHUI reviewed Oct 25, 2024

View reviewed changes

qingqing01 merged commit 2e8b220 into PaddlePaddle:develop Oct 25, 2024
9 of 12 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[LLM INFER] Fix some bugs and chatglm_v2 support block_attn #9271

[LLM INFER] Fix some bugs and chatglm_v2 support block_attn #9271

yuanlehome commented Oct 15, 2024 •

edited

Loading

paddle-bot bot commented Oct 15, 2024

codecov bot commented Oct 15, 2024 •

edited

Loading

ZHUI Oct 25, 2024

yuanlehome Oct 25, 2024

ZHUI Oct 25, 2024

yuanlehome Oct 25, 2024

ZHUI Oct 25, 2024

yuanlehome Oct 25, 2024

[LLM INFER] Fix some bugs and chatglm_v2 support block_attn #9271

[LLM INFER] Fix some bugs and chatglm_v2 support block_attn #9271

Conversation

yuanlehome commented Oct 15, 2024 • edited Loading

PR types

PR changes

Description

paddle-bot bot commented Oct 15, 2024

codecov bot commented Oct 15, 2024 • edited Loading

Codecov Report

ZHUI Oct 25, 2024

Choose a reason for hiding this comment

yuanlehome Oct 25, 2024

Choose a reason for hiding this comment

ZHUI Oct 25, 2024

Choose a reason for hiding this comment

yuanlehome Oct 25, 2024

Choose a reason for hiding this comment

ZHUI Oct 25, 2024

Choose a reason for hiding this comment

yuanlehome Oct 25, 2024

Choose a reason for hiding this comment

yuanlehome commented Oct 15, 2024 •

edited

Loading

codecov bot commented Oct 15, 2024 •

edited

Loading