Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[LLM] Support gpt3 fine grained dybatch v1 #7080

Merged
merged 10 commits into from
Sep 20, 2023

Conversation

yuanlehome
Copy link
Collaborator

PR types

Others

PR changes

Others

Description

Support gpt3 fine grained dybatch v1.

@paddle-bot
Copy link

paddle-bot bot commented Sep 19, 2023

Thanks for your contribution!

@codecov
Copy link

codecov bot commented Sep 19, 2023

Codecov Report

Merging #7080 (d847287) into develop (da02add) will decrease coverage by 0.13%.
Report is 12 commits behind head on develop.
The diff coverage is 0.00%.

@@             Coverage Diff             @@
##           develop    #7080      +/-   ##
===========================================
- Coverage    59.91%   59.78%   -0.13%     
===========================================
  Files          556      558       +2     
  Lines        82037    82217     +180     
===========================================
+ Hits         49149    49152       +3     
- Misses       32888    33065     +177     
Files Changed Coverage Δ
paddlenlp/experimental/transformers/__init__.py 0.00% <0.00%> (ø)
...erimental/transformers/fused_transformer_layers.py 0.00% <0.00%> (ø)
...enlp/experimental/transformers/generation_utils.py 0.00% <0.00%> (ø)
...addlenlp/experimental/transformers/gpt/__init__.py 0.00% <0.00%> (ø)
...addlenlp/experimental/transformers/gpt/modeling.py 0.00% <0.00%> (ø)

... and 5 files with indirect coverage changes

def set_state_dict(self, state_dict):
dtype = paddle.get_default_dtype()

for k, v in state_dict.items():
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这里的if代码有点多啊?能改的和llama里的一样吗?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

建议先这样写,因为gpt的模型来源比较复杂,参数名比较乱,这种if写法已经尽可能多的兼容各种命名的模型了

Copy link
Contributor

@wj-Mcat wj-Mcat left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

代码质量很好,除了以下两个 comment,还有一个小建议:添加单测,等 #7056 合入之后编写一个 test_predictor 的单测呗。

cls, pretrained_model_name_or_path, from_hf_hub: bool = False, subfolder: str | None = None, *args, **kwargs
):
# TODO: Support safetensors loading.
kwargs["use_safetensors"] = False
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
kwargs["use_safetensors"] = False
kwargs["use_safetensors"] = kwargs.get("use_safetensors", False)

建议使用这个,因为单分片 safetensors 是可以支持 inferencemodel 加载的。

position_ids = tgt_pos
attention_mask = (tgt_generation_mask - 1) * 1e4
else:
attention_mask = (attention_mask - 1) * 1e4
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这里建议使用:paddle.finfo(attention_mask.dtype).min 的方式来转化 attention_mask 的值。

在 bf16 和 fp16 下面不同值域不一样,建议用这个来得到该 dtype 下的最小值。

上面的 tgt_attention_mask 也是需要调整一下。

@yuanlehome
Copy link
Collaborator Author

代码质量很好,除了以下两个 comment,还有一个小建议:添加单测,等 #7056 合入之后编写一个 test_predictor 的单测呗。

可以的,我想补充单测和comment指出的问题 放在下个PR一块做吧,这个PR先合一版?

@wj-Mcat wj-Mcat merged commit af28006 into PaddlePaddle:develop Sep 20, 2023
9 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants