-
Notifications
You must be signed in to change notification settings - Fork 5.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add the DistributedFusedLamb optimizer #39148
Conversation
Thanks for your contribution! |
27d44d9
to
112c98e
Compare
112c98e
to
f422178
Compare
f422178
to
3cc786a
Compare
Sorry to inform you that b9a7f57's CIs have passed for more than 7 days. To prevent PR conflicts, you need to re-run all CIs manually. |
9e85146
to
d5f3392
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM for dtype registerar
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
self._optimizer._set_scale(self._loss_scaling) | ||
optimize_ops = self._optimizer.apply_gradients(params_grads) | ||
found_inf = self._optimizer._found_inf | ||
self._add_dynamic_loss_scaling(params_grads, found_inf) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not important, but it seems params_grads is not used in _add_dynamic_loss_scaling
in this case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes. But not quite important.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM for set_tests_properties(test_distributed_fused_lamb_op_with_clip PROPERTIES TIMEOUT 120) set_tests_properties(test_distributed_fused_lamb_op_without_clip PROPERTIES TIMEOUT 120)
PR types
New features
PR changes
OPs
Describe
Add hybrid parallel DistributedFusedLamb optimizer.
ncclAllReduce
to get the global gradient square L2-norm.ncclAllReduce
to get the global square L2-norm value.ncclAllGather
to get the whole updated parameter.