-
-
Notifications
You must be signed in to change notification settings - Fork 617
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix sync_all_reduce
to consider update->compute->update case
#2803
Fix sync_all_reduce
to consider update->compute->update case
#2803
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@sadra-barikbin let's write a test that fails on master and is fixed with this PR. In the test we have to check Metric attributes of type: tensor and scalar
…compute-update-compute-case
…compute-update-compute-case
…compute-update-compute-case
…compute-update-compute-case
…compute-update-compute-case
…compute-update-compute-case
…compute-update-compute-case
…compute-update-compute-case
@sadra-barikbin the code is definitely broken, no point to update the branch every time |
…compute-update-compute-case
@sadra-barikbin let's also remove |
Shall I remove |
Or you keep it for BC and just set |
Yes, we should keep it for BC and also for reseting cached value. Later we can rename it with deprecation cycles etc. |
From now on, compute after compute does the whole thing again
…to-consider-compute-update-compute-case' into Fix-sync_all_reduce-decorator-to-consider-compute-update-compute-case
…compute-update-compute-case
Co-authored-by: vfdev <vfdev.5@gmail.com>
…compute-update-compute-case
@sadra-barikbin please resolve the conflict for this PR |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, thanks @sadra-barikbin !
Problem Description:
We get wrong result in the situation below, given that we are in a distributed config and
SomeMetric
has decorated itscompute
method withsync_all_reduce
:This is the case because
sync_all_reduce
changes the state attributes to the accumulated values in all ranks, making them ready to callcompute
and callingupdate
again doesn't make sense.Solution (By @vfdev-5 ):
In
sync_all_reduce
, store the state attributes in temporary variables. Then do the collective ops on the state attributes. After that, call the decorated function. Finally, restore the state attributes and return the result of the decorated function.