Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

inference support llama3(wint8|4/a8w8) #8630

Merged
merged 2 commits into from
Jun 27, 2024

Conversation

yuanlehome
Copy link
Collaborator

PR types

New features

PR changes

Others

Description

inference support llama3(wint8|4/a8w8)

Copy link

paddle-bot bot commented Jun 19, 2024

Thanks for your contribution!

Copy link

codecov bot commented Jun 19, 2024

Codecov Report

Attention: Patch coverage is 0% with 2 lines in your changes missing coverage. Please review.

Project coverage is 55.80%. Comparing base (65e721e) to head (4821cd6).
Report is 241 commits behind head on develop.

Files with missing lines Patch % Lines
...dlenlp/experimental/transformers/llama/modeling.py 0.00% 2 Missing ⚠️
Additional details and impacted files
@@           Coverage Diff            @@
##           develop    #8630   +/-   ##
========================================
  Coverage    55.80%   55.80%           
========================================
  Files          620      620           
  Lines        96642    96642           
========================================
  Hits         53928    53928           
  Misses       42714    42714           

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@@ -1213,8 +1214,8 @@ def create_predictor(
init_chat_template(tokenizer, predictor_args.model_name_or_path, predictor_args.chat_template)

# TODO(wj-Mcat): fix llama tokenzier pad_token bug
if isinstance(tokenizer, LlamaTokenizer) and not tokenizer.pad_token:
tokenizer.pad_token = tokenizer.unk_token
if (isinstance(tokenizer, LlamaTokenizer) or isinstance(tokenizer, Llama3Tokenizer)) and not tokenizer.pad_token:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这块地方可以简化为:if (isinstance(tokenizer, (LlamaTokenizer, Llama3Tokenizer)) and not tokenizer.pad_token:。isintance支持元组输入。

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

好的

@@ -549,7 +549,7 @@ def init_weight_shape(self, config):
self.qkv_weight_shape = (
[(self.num_heads + 2 * self.kv_num_heads) * self.head_dim, self.embed_dim]
if config.trans_qkvw
else [(self.num_heads + 2 * self.kv_num_heads) * self.head_dim, self.embed_dim]
else [self.embed_dim, (self.num_heads + 2 * self.kv_num_heads) * self.head_dim]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这块shape为啥前后修改了?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

因为之前是错误的

@yuanlehome yuanlehome reopened this Jun 25, 2024
@DesmonDay
Copy link
Contributor

DesmonDay commented Jun 26, 2024

如讨论,目前llama3模型,在动态图非fuse场景下推理正常,在fuse场景下推理存在多进程问题。待后续排查。另外动转静时不可以设置src_length进行推理,以及高性能推理下无法正确eos。 @yuanlehome

Copy link
Contributor

@DesmonDay DesmonDay left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@sijunhe sijunhe merged commit faabf87 into PaddlePaddle:develop Jun 27, 2024
8 of 11 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants