-
Notifications
You must be signed in to change notification settings - Fork 3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Inference LLM] refine some code in llama wint8/4 #8796
[Inference LLM] refine some code in llama wint8/4 #8796
Conversation
Thanks for your contribution! |
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## develop #8796 +/- ##
===========================================
+ Coverage 55.03% 55.51% +0.48%
===========================================
Files 627 626 -1
Lines 98921 98057 -864
===========================================
- Hits 54440 54438 -2
+ Misses 44481 43619 -862 ☔ View full report in Codecov by Sentry. |
@@ -104,7 +67,6 @@ def main(): | |||
|
|||
if tensor_parallel_degree > 1: | |||
export_args.output_path = os.path.join(export_args.output_path, f"rank_{tensor_parallel_rank}") | |||
validate_pdmodel(export_args.output_path, predictor_args.model_prefix, predictor_args.device) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
这个执行的时候会需要更多的显存,极限场景下会导致爆显存。并且并无实质性的作用,故删除之。
@@ -789,11 +788,8 @@ def set_state_dict(self, state_dict): | |||
qkv_weight_tensor = paddle.to_tensor(concated_qkv_weight) | |||
qkv_weight_tensor = paddle.transpose(qkv_weight_tensor, perm=[1, 0]) | |||
qkv_quanted_weight_tensor, qkv_weight_scale_tensor = weight_quantize( | |||
qkv_weight_tensor.cuda(), algo=self.quant_type |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
这些地方以及下面的改动原因是为啥?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
改动前的逻辑是错误的,wint4 ci并无监控
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
LGTM,建议后面针对 wint4 增加 CI 监控。 |
依赖一个Paddle PR的合入,合入后再添加~ |
PR types
New features
PR changes
Others
Description
refine some code in llama wint8/4