-
Notifications
You must be signed in to change notification settings - Fork 83
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat(python, core): support process group in with_bagua
, support hierarchical communication in bytegrad algorithm
#300
Conversation
with_bagua
with_bagua
needs to be merged after #298 |
with_bagua
with_bagua
solved by passing |
with_bagua
with_bagua
with_bagua
with_bagua
@@ -42,7 +42,7 @@ def ensure_bagua_tensor( | |||
assert ( | |||
self.bagua_tensor_name == name | |||
), "assigning a different name to existing bagua tensor is forbidden" | |||
return self |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why not return self if it is already a bagua_tensor?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
in accord with #271, still need to skip return self
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Each with_bagua
will generate a new module name, which leads to a dismatch between module._bagua_backend
and bucket._bagua_backend
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
see comments
with_bagua
with_bagua
5bc98c9
to
2aac918
Compare
with_bagua
with_bagua
, support hierarchical communication in bytegrad algorithm
BREAKING CHANGE:
AlgorithmImpl
must pass a process group to its__init__
methodByteGradAlgorithm
can accept a parameter to enable hierarchical communicationdecentralized_synchronous_op_copy_back_peer_weight
is now removed fromBaguaBucket
, callcopy_back_peer_weight
on decentralized synchronous op instead