dp main_grad #7293

tianhaodongbd · 2023-10-20T08:53:52Z

PR types

Others

PR changes

Others

Description

dp main_grad

paddle-bot · 2023-10-20T08:53:56Z

Thanks for your contribution!

codecov · 2023-10-30T13:15:49Z

Codecov Report

Attention: 1 lines in your changes are missing coverage. Please review.

Comparison is base (90f44a9) 59.34% compared to head (fc0bc57) 58.00%.
Report is 76 commits behind head on develop.

Files	Patch %	Lines
paddlenlp/trainer/trainer.py	80.00%	1 Missing ⚠️

Additional details and impacted files

@@             Coverage Diff             @@
##           develop    #7293      +/-   ##
===========================================
- Coverage    59.34%   58.00%   -1.35%     
===========================================
  Files          567      579      +12     
  Lines        83355    86266    +2911     
===========================================
+ Hits         49466    50037     +571     
- Misses       33889    36229    +2340

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

ZHUI · 2023-11-21T07:38:06Z

paddlenlp/trainer/trainer.py

        # Multi-gpu training
        if self.args.world_size > 1 and not self.args.use_hybrid_parallel:
            model = paddle.DataParallel(model)
            # Distributed training (should be after fp16 initialization)

+            if self.args.amp_master_grad:
+                mix_precision_utils.MixPrecisionLayer(model, dtype=self.amp_dtype)


这里的model的话，dp已经包了一层。确认一下 MixPrecisionLayer 再包一次影响不？

这里这样包一次是可以的

ZHUI · 2023-11-21T07:40:58Z

paddlenlp/trainer/training_args.py

@@ -810,9 +810,10 @@ def __post_init__(self):
                self.pipeline_parallel_degree <= 1
                and self.tensor_parallel_degree <= 1
                and (not self.sharding or ShardingOption.FULL_SHARD in self.sharding)
+                and self.use_hybrid_parallel
            ):
                raise ValueError(


是不是只有stage3 不行，这里可以把判断改简单些。

ZHUI

LGTM

dp main_grad

fb0f718

rewrite dp main_grad

334cc66

single card main_grad

29024f4

ZHUI reviewed Nov 21, 2023

View reviewed changes

tianhaodongbd added 2 commits November 21, 2023 10:25

dp main_grad support no recompute

6ac77b1

fixed dp1 master grad

fc0bc57

ZHUI approved these changes Nov 21, 2023

View reviewed changes

ZHUI merged commit 628db88 into PaddlePaddle:develop Nov 22, 2023
9 checks passed

ZHUI mentioned this pull request Jan 2, 2024

PaddleNLP 2.7.0 Release Note Candidate #7753

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

dp main_grad #7293

dp main_grad #7293

tianhaodongbd commented Oct 20, 2023

paddle-bot bot commented Oct 20, 2023

codecov bot commented Oct 30, 2023 •

edited

Loading

ZHUI Nov 21, 2023

tianhaodongbd Nov 21, 2023

ZHUI Nov 21, 2023

ZHUI left a comment

dp main_grad #7293

dp main_grad #7293

Conversation

tianhaodongbd commented Oct 20, 2023

PR types

PR changes

Description

paddle-bot bot commented Oct 20, 2023

codecov bot commented Oct 30, 2023 • edited Loading

Codecov Report

ZHUI Nov 21, 2023

Choose a reason for hiding this comment

tianhaodongbd Nov 21, 2023

Choose a reason for hiding this comment

ZHUI Nov 21, 2023

Choose a reason for hiding this comment

ZHUI left a comment

Choose a reason for hiding this comment

codecov bot commented Oct 30, 2023 •

edited

Loading