-
Notifications
You must be signed in to change notification settings - Fork 2.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix bug for fp16 + delay_scale_loss_scale + sharding_stage1_overlap #8314
Conversation
Thanks for your contribution! |
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## develop #8314 +/- ##
===========================================
- Coverage 55.33% 55.33% -0.01%
===========================================
Files 614 614
Lines 95341 95342 +1
===========================================
Hits 52753 52753
- Misses 42588 42589 +1 ☔ View full report in Codecov by Sentry. |
@@ -1013,6 +1013,7 @@ def _inner_training_loop( | |||
self.timers and self.timers("optimizer-step").start() | |||
|
|||
if self.args.gradient_accumulation_steps > 1 and self._enable_delay_scale_loss(): | |||
paddle.device.synchronize() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
不会影响性能吗?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
不会,这块必须不能有overlap。这儿需要有个同步,不然可能出现这个grad还没通讯完就做scale的问题。
PR types
Others
PR changes
Others
Description
fix bug for fp16 + delay_scale_loss_scale + sharding_stage1_overlap