Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Auto Parallel] Support dynamic semi-auto training in Llama2 model #7851

Merged
merged 2 commits into from
Jan 18, 2024

Conversation

haohongxiang
Copy link
Contributor

@haohongxiang haohongxiang commented Jan 16, 2024

PR types

Bug fixes

PR changes

Others

Description

[Auto Parallel] Support dynamic semi-auto training in Llama2 model

Copy link

paddle-bot bot commented Jan 16, 2024

Thanks for your contribution!

Copy link

codecov bot commented Jan 16, 2024

Codecov Report

Attention: 423 lines in your changes are missing coverage. Please review.

Comparison is base (04142e3) 56.96% compared to head (327d788) 56.67%.
Report is 5 commits behind head on develop.

Files Patch % Lines
paddlenlp/transformers/llama/modeling_3D_auto.py 16.20% 419 Missing ⚠️
paddlenlp/trainer/utils/reshard/common.py 0.00% 4 Missing ⚠️
Additional details and impacted files
@@             Coverage Diff             @@
##           develop    #7851      +/-   ##
===========================================
- Coverage    56.96%   56.67%   -0.29%     
===========================================
  Files          587      588       +1     
  Lines        88647    89243     +596     
===========================================
+ Hits         50494    50580      +86     
- Misses       38153    38663     +510     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@haohongxiang haohongxiang force-pushed the dygraph_semi_auto_llama2 branch 15 times, most recently from cf7d444 to b9acdf0 Compare January 18, 2024 05:06
@haohongxiang haohongxiang changed the title [don't merge] Dygraph semi auto llama2 [Auto Parallel] Support dynamic semi-auto training in Llama2 model Jan 18, 2024

auto variance_shape = x_shape;
variance_shape.pop_back();
auto invvar = paddle::empty(variance_shape, paddle::DataType::FLOAT32, place);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这两处改动的原因是什么?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fused_ln的改动是因为原本variance的infer shape有问题,只是动手不会报错;动半加上切分推导规则,就会挂,所以需要修复。同样的,框架里layer norm算子也做了修复,详见PR-58776

return outputs


class LlamaDecoderLayerAuto(nn.Layer):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这些模型名字是不是跟 modeling_auto 是冲突的?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这些模型名字仅用于当前文件内自用,没有加到__all__列表里,不会对用户或训练侧暴露;另外这里只是中间态,后期等动静半执行代码接入后,仅保留modeling_3D_auto.py,原本纯静半的modeling_auto.py会删除

@@ -14,6 +14,7 @@

from .configuration import *
from .modeling import *
from .modeling_3D_auto import *
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

一些命名会冲突吗?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

回复请见上条

Copy link
Collaborator

@wawltor wawltor left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link
Collaborator

@ZHUI ZHUI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@wawltor wawltor merged commit 16d3c49 into PaddlePaddle:develop Jan 18, 2024
8 of 10 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants