[Auto Parallel] Support dynamic semi-auto training in Llama2 model #7851

haohongxiang · 2024-01-16T08:25:11Z

PR types

Bug fixes

PR changes

Others

Description

[Auto Parallel] Support dynamic semi-auto training in Llama2 model

paddle-bot · 2024-01-16T08:25:16Z

Thanks for your contribution!

codecov · 2024-01-16T09:04:40Z

Codecov Report

Attention: 423 lines in your changes are missing coverage. Please review.

Comparison is base (04142e3) 56.96% compared to head (327d788) 56.67%.
Report is 5 commits behind head on develop.

Files	Patch %	Lines
paddlenlp/transformers/llama/modeling_3D_auto.py	16.20%	419 Missing ⚠️
paddlenlp/trainer/utils/reshard/common.py	0.00%	4 Missing ⚠️

Additional details and impacted files

@@             Coverage Diff             @@
##           develop    #7851      +/-   ##
===========================================
- Coverage    56.96%   56.67%   -0.29%     
===========================================
  Files          587      588       +1     
  Lines        88647    89243     +596     
===========================================
+ Hits         50494    50580      +86     
- Misses       38153    38663     +510

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

wawltor · 2024-01-18T06:22:15Z

model_zoo/gpt-3/external_ops/fused_ln/layer_norm_cuda.cu

-
+  auto variance_shape = x_shape;
+  variance_shape.pop_back();
+  auto invvar = paddle::empty(variance_shape, paddle::DataType::FLOAT32, place);


这两处改动的原因是什么？

fused_ln的改动是因为原本variance的infer shape有问题，只是动手不会报错；动半加上切分推导规则，就会挂，所以需要修复。同样的，框架里layer norm算子也做了修复，详见PR-58776

ZHUI · 2024-01-18T06:30:15Z

paddlenlp/transformers/llama/modeling_3D_auto.py

+        return outputs
+
+
+class LlamaDecoderLayerAuto(nn.Layer):


这些模型名字是不是跟 modeling_auto 是冲突的?

这些模型名字仅用于当前文件内自用，没有加到__all__列表里，不会对用户或训练侧暴露；另外这里只是中间态，后期等动静半执行代码接入后，仅保留modeling_3D_auto.py，原本纯静半的modeling_auto.py会删除

ZHUI · 2024-01-18T06:31:24Z

paddlenlp/transformers/llama/__init__.py

@@ -14,6 +14,7 @@

 from .configuration import *
 from .modeling import *
+from .modeling_3D_auto import *


一些命名会冲突吗？

回复请见上条

wawltor

LGTM

ZHUI

LGTM

update

1249c74

haohongxiang force-pushed the dygraph_semi_auto_llama2 branch from e16875e to de9bb93 Compare January 16, 2024 09:01

haohongxiang force-pushed the dygraph_semi_auto_llama2 branch 15 times, most recently from cf7d444 to b9acdf0 Compare January 18, 2024 05:06

add ci

327d788

haohongxiang force-pushed the dygraph_semi_auto_llama2 branch from b9acdf0 to 327d788 Compare January 18, 2024 05:09

haohongxiang changed the title ~~[don't merge] Dygraph semi auto llama2~~ [Auto Parallel] Support dynamic semi-auto training in Llama2 model Jan 18, 2024

wawltor reviewed Jan 18, 2024

View reviewed changes

ZHUI reviewed Jan 18, 2024

View reviewed changes

wawltor approved these changes Jan 18, 2024

View reviewed changes

ZHUI approved these changes Jan 18, 2024

View reviewed changes

wawltor merged commit 16d3c49 into PaddlePaddle:develop Jan 18, 2024
8 of 10 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Auto Parallel] Support dynamic semi-auto training in Llama2 model #7851

[Auto Parallel] Support dynamic semi-auto training in Llama2 model #7851

haohongxiang commented Jan 16, 2024 •

edited

Loading

paddle-bot bot commented Jan 16, 2024

codecov bot commented Jan 16, 2024 •

edited

Loading

wawltor Jan 18, 2024

haohongxiang Jan 18, 2024

ZHUI Jan 18, 2024

haohongxiang Jan 18, 2024

ZHUI Jan 18, 2024

haohongxiang Jan 18, 2024

wawltor left a comment

ZHUI left a comment

[Auto Parallel] Support dynamic semi-auto training in Llama2 model #7851

[Auto Parallel] Support dynamic semi-auto training in Llama2 model #7851

Conversation

haohongxiang commented Jan 16, 2024 • edited Loading

PR types

PR changes

Description

paddle-bot bot commented Jan 16, 2024

codecov bot commented Jan 16, 2024 • edited Loading

Codecov Report

wawltor Jan 18, 2024

Choose a reason for hiding this comment

haohongxiang Jan 18, 2024

Choose a reason for hiding this comment

ZHUI Jan 18, 2024

Choose a reason for hiding this comment

haohongxiang Jan 18, 2024

Choose a reason for hiding this comment

ZHUI Jan 18, 2024

Choose a reason for hiding this comment

haohongxiang Jan 18, 2024

Choose a reason for hiding this comment

wawltor left a comment

Choose a reason for hiding this comment

ZHUI left a comment

Choose a reason for hiding this comment

haohongxiang commented Jan 16, 2024 •

edited

Loading

codecov bot commented Jan 16, 2024 •

edited

Loading