Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[INFER] llama&qwen2 A8W8 support skip_scale #8987

Closed
wants to merge 17 commits into from
1 change: 1 addition & 0 deletions llm/predict/predictor.py
Original file line number Diff line number Diff line change
Expand Up @@ -1241,6 +1241,7 @@ def create_predictor(
config.quant_type = predictor_args.quant_type
config.cachekv_int8_type = predictor_args.cachekv_int8_type
config.use_fake_parameter = predictor_args.use_fake_parameter
config.top_k = predictor_args.top_k
config.single_card_ptq = True
if config.quantization_config.quant_type is not None:
predictor_args.quant_type = config.quantization_config.quant_type
Expand Down
1 change: 1 addition & 0 deletions paddlenlp/experimental/transformers/bloom/modeling.py
Original file line number Diff line number Diff line change
Expand Up @@ -293,6 +293,7 @@

@paddle.no_grad()
def set_state_dict(self, state_dict, use_structured_name=True):
self.transformer_block.init_weight()

Check warning on line 296 in paddlenlp/experimental/transformers/bloom/modeling.py

View check run for this annotation

Codecov / codecov/patch

paddlenlp/experimental/transformers/bloom/modeling.py#L296

Added line #L296 was not covered by tests
for k, v in state_dict.items():
if k.find("word_embeddings.weight") >= 0:
self.word_embeddings.weight.set_value(paddle.to_tensor(v))
Expand Down
1 change: 1 addition & 0 deletions paddlenlp/experimental/transformers/chatglm/modeling.py
Original file line number Diff line number Diff line change
Expand Up @@ -377,6 +377,7 @@

@paddle.no_grad()
def set_state_dict(self, state_dict, use_structured_name=True):
self.transformer_block.init_weight()

Check warning on line 380 in paddlenlp/experimental/transformers/chatglm/modeling.py

View check run for this annotation

Codecov / codecov/patch

paddlenlp/experimental/transformers/chatglm/modeling.py#L380

Added line #L380 was not covered by tests
dtype = paddle.get_default_dtype()
config = self.config
embed_dim = config.hidden_size
Expand Down
2 changes: 2 additions & 0 deletions paddlenlp/experimental/transformers/chatglm_v2/modeling.py
Original file line number Diff line number Diff line change
Expand Up @@ -290,6 +290,8 @@

@paddle.no_grad()
def set_state_dict(self, state_dict):
self.transformer_block.init_weight()

Check warning on line 293 in paddlenlp/experimental/transformers/chatglm_v2/modeling.py

View check run for this annotation

Codecov / codecov/patch

paddlenlp/experimental/transformers/chatglm_v2/modeling.py#L293

Added line #L293 was not covered by tests

# find the real name.
def key(name):
result_list = []
Expand Down
Loading
Loading