Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[DCU] fix llama inference bug on DCU #8815

Merged
merged 1 commit into from
Jul 29, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
Expand Up @@ -550,11 +550,7 @@
if config.trans_qkvw
else [self.embed_dim, (self.num_heads + 2 * self.kv_num_heads) * self.head_dim]
)
self.linear_weight_shape = (
[self.num_heads * self.head_dim, self.embed_dim]
if config.trans_qkvw
else [self.embed_dim, self.num_heads * self.head_dim]
)
self.linear_weight_shape = [self.num_heads * self.head_dim, self.embed_dim]

Check warning on line 553 in paddlenlp/experimental/transformers/fused_transformer_layers.py

View check run for this annotation

Codecov / codecov/patch

paddlenlp/experimental/transformers/fused_transformer_layers.py#L553

Added line #L553 was not covered by tests
self.ffn1_weight_shape = (
[self.embed_dim, self.dim_feedforward * 2]
if self.activation.endswith("glu")
Expand Down
4 changes: 2 additions & 2 deletions paddlenlp/experimental/transformers/llama/modeling.py
Original file line number Diff line number Diff line change
Expand Up @@ -565,7 +565,7 @@
use_neox_rotary_style=True,
use_dynamic_cachekv_quant=config.use_cachekv_int8 == "dynamic",
rank_id=config.tensor_parallel_rank,
trans_qkvw=(True if not paddle.is_compiled_with_rocm() else False),
trans_qkvw=(False if paddle.is_compiled_with_rocm() and self.quant_type == "a8w8" else True),
)

self.set_transformer_block(transformer_config)
Expand Down Expand Up @@ -752,7 +752,7 @@
unfused_state_dict["self_attn.v_proj.weight"] = state_dict[
"llama.layers.{}.self_attn.v_proj.weight".format(idx)
]
if paddle.is_compiled_with_rocm():
if paddle.is_compiled_with_rocm() and self.quant_type == "a8w8":

Check warning on line 755 in paddlenlp/experimental/transformers/llama/modeling.py

View check run for this annotation

Codecov / codecov/patch

paddlenlp/experimental/transformers/llama/modeling.py#L755

Added line #L755 was not covered by tests
concated_qkv_weight = np.concatenate(
[
unfused_state_dict["self_attn.q_proj.weight"],
Expand Down
Loading