fix bug for fp16 + delay_scale_loss_scale + sharding_stage1_overlap #8314

FeixLiu · 2024-04-23T08:11:36Z

PR types

Others

PR changes

Others

Description

fix bug for fp16 + delay_scale_loss_scale + sharding_stage1_overlap

paddle-bot · 2024-04-23T08:11:41Z

Thanks for your contribution!

codecov · 2024-04-23T08:40:52Z

Codecov Report

Attention: Patch coverage is 0% with 1 lines in your changes are missing coverage. Please review.

Project coverage is 55.33%. Comparing base (e67fd78) to head (801815c).
Report is 2 commits behind head on develop.

Files	Patch %	Lines
paddlenlp/trainer/trainer.py	0.00%	1 Missing ⚠️

Additional details and impacted files

@@             Coverage Diff             @@
##           develop    #8314      +/-   ##
===========================================
- Coverage    55.33%   55.33%   -0.01%     
===========================================
  Files          614      614              
  Lines        95341    95342       +1     
===========================================
  Hits         52753    52753              
- Misses       42588    42589       +1

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

ZHUI · 2024-04-23T09:18:54Z

paddlenlp/trainer/trainer.py

@@ -1013,6 +1013,7 @@ def _inner_training_loop(
                    self.timers and self.timers("optimizer-step").start()

                    if self.args.gradient_accumulation_steps > 1 and self._enable_delay_scale_loss():
+                        paddle.device.synchronize()


不会影响性能吗？

不会，这块必须不能有overlap。这儿需要有个同步，不然可能出现这个grad还没通讯完就做scale的问题。

bug fixer

801815c

ZHUI reviewed Apr 23, 2024

View reviewed changes

ZHUI approved these changes Apr 23, 2024

View reviewed changes

wawltor merged commit eaca5b2 into PaddlePaddle:develop Apr 24, 2024
8 of 11 checks passed

FeixLiu added a commit to FeixLiu/PaddleNLP that referenced this pull request Apr 24, 2024

bug fixer (PaddlePaddle#8314)

a93ce88

FeixLiu added a commit to FeixLiu/PaddleNLP that referenced this pull request Apr 24, 2024

bug fixer (PaddlePaddle#8314)

127d29f

wawltor pushed a commit that referenced this pull request Apr 24, 2024

bug fixer (#8314) (#8317)

8a6589c

wawltor pushed a commit that referenced this pull request Apr 24, 2024

bug fixer (#8314) (#8318)

871070d

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix bug for fp16 + delay_scale_loss_scale + sharding_stage1_overlap #8314

fix bug for fp16 + delay_scale_loss_scale + sharding_stage1_overlap #8314

FeixLiu commented Apr 23, 2024

paddle-bot bot commented Apr 23, 2024

codecov bot commented Apr 23, 2024 •

edited

Loading

ZHUI Apr 23, 2024

FeixLiu Apr 23, 2024

fix bug for fp16 + delay_scale_loss_scale + sharding_stage1_overlap #8314

fix bug for fp16 + delay_scale_loss_scale + sharding_stage1_overlap #8314

Conversation

FeixLiu commented Apr 23, 2024

PR types

PR changes

Description

paddle-bot bot commented Apr 23, 2024

codecov bot commented Apr 23, 2024 • edited Loading

Codecov Report

ZHUI Apr 23, 2024

Choose a reason for hiding this comment

FeixLiu Apr 23, 2024

Choose a reason for hiding this comment

codecov bot commented Apr 23, 2024 •

edited

Loading