[Inference LLM] refine some code in llama wint8/4 #8796

yuanlehome · 2024-07-23T09:52:28Z

PR types

New features

PR changes

Others

Description

refine some code in llama wint8/4

paddle-bot · 2024-07-23T09:52:32Z

Thanks for your contribution!

codecov · 2024-07-23T11:29:19Z

Codecov Report

Attention: Patch coverage is 0% with 2 lines in your changes missing coverage. Please review.

Project coverage is 55.51%. Comparing base (7c18d9d) to head (98e4683).
Report is 219 commits behind head on develop.

Files with missing lines	Patch %	Lines
...dlenlp/experimental/transformers/llama/modeling.py	0.00%	2 Missing ⚠️

Additional details and impacted files

@@             Coverage Diff             @@
##           develop    #8796      +/-   ##
===========================================
+ Coverage    55.03%   55.51%   +0.48%     
===========================================
  Files          627      626       -1     
  Lines        98921    98057     -864     
===========================================
- Hits         54440    54438       -2     
+ Misses       44481    43619     -862

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

yuanlehome · 2024-07-23T12:21:53Z

llm/predict/export_model.py

@@ -104,7 +67,6 @@ def main():

    if tensor_parallel_degree > 1:
        export_args.output_path = os.path.join(export_args.output_path, f"rank_{tensor_parallel_rank}")
-    validate_pdmodel(export_args.output_path, predictor_args.model_prefix, predictor_args.device)


这个执行的时候会需要更多的显存，极限场景下会导致爆显存。并且并无实质性的作用，故删除之。

DesmonDay · 2024-07-23T12:58:38Z

paddlenlp/experimental/transformers/llama/modeling.py

@@ -789,11 +788,8 @@ def set_state_dict(self, state_dict):
                qkv_weight_tensor = paddle.to_tensor(concated_qkv_weight)
                qkv_weight_tensor = paddle.transpose(qkv_weight_tensor, perm=[1, 0])
                qkv_quanted_weight_tensor, qkv_weight_scale_tensor = weight_quantize(
-                    qkv_weight_tensor.cuda(), algo=self.quant_type


这些地方以及下面的改动原因是为啥？

改动前的逻辑是错误的，wint4 ci并无监控

DesmonDay

LGTM

DesmonDay · 2024-07-23T13:02:40Z

LGTM，建议后面针对 wint4 增加 CI 监控。

yuanlehome · 2024-07-24T04:37:23Z

LGTM，建议后面针对 wint4 增加 CI 监控。

依赖一个Paddle PR的合入，合入后再添加～

delete validate model and fix wint8/4 in llama

81c8848

update

98e4683

yuanlehome changed the title ~~[WIP][Inference LLM] llama support w8a8c8 and refine some code~~ [Inference LLM] refine some code in llama wint8/4 Jul 23, 2024

yuanlehome commented Jul 23, 2024

View reviewed changes

DesmonDay reviewed Jul 23, 2024

View reviewed changes

DesmonDay approved these changes Jul 23, 2024

View reviewed changes

wawltor merged commit 6f56bd4 into PaddlePaddle:develop Jul 24, 2024
9 of 12 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Inference LLM] refine some code in llama wint8/4 #8796

[Inference LLM] refine some code in llama wint8/4 #8796

yuanlehome commented Jul 23, 2024 •

edited

Loading

paddle-bot bot commented Jul 23, 2024

codecov bot commented Jul 23, 2024 •

edited

Loading

yuanlehome Jul 23, 2024

DesmonDay Jul 23, 2024 •

edited

Loading

yuanlehome Jul 23, 2024

DesmonDay left a comment

DesmonDay commented Jul 23, 2024

yuanlehome commented Jul 24, 2024

[Inference LLM] refine some code in llama wint8/4 #8796

[Inference LLM] refine some code in llama wint8/4 #8796

Conversation

yuanlehome commented Jul 23, 2024 • edited Loading

PR types

PR changes

Description

paddle-bot bot commented Jul 23, 2024

codecov bot commented Jul 23, 2024 • edited Loading

Codecov Report

yuanlehome Jul 23, 2024

Choose a reason for hiding this comment

DesmonDay Jul 23, 2024 • edited Loading

Choose a reason for hiding this comment

yuanlehome Jul 23, 2024

Choose a reason for hiding this comment

DesmonDay left a comment

Choose a reason for hiding this comment

DesmonDay commented Jul 23, 2024

yuanlehome commented Jul 24, 2024

yuanlehome commented Jul 23, 2024 •

edited

Loading

codecov bot commented Jul 23, 2024 •

edited

Loading

DesmonDay Jul 23, 2024 •

edited

Loading