You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hi, thanks for your great work
I have one question about the weight updating protocol.
The gradients of the local network (including all auxiliary tasks) are applied in the process function in trainer.py. However, I notice that the sync function (which copies the weights of global network to the local one) is ran BEFORE the apply_gradient is ran (line 354 and 409 respectively).
Following the code behind these 2 functions, you are copying the shared weights to the local network (so basically the global variables and local variables will be the same by then), then calculating gradients on local variables, then applying the gradients to the global variables. It does not sound logic, does it?
Please correct me if I am wrong.
The text was updated successfully, but these errors were encountered:
Hi, thanks for your great work
I have one question about the weight updating protocol.
The gradients of the local network (including all auxiliary tasks) are applied in the
process
function intrainer.py
. However, I notice that the sync function (which copies the weights of global network to the local one) is ran BEFORE the apply_gradient is ran (line 354 and 409 respectively).Following the code behind these 2 functions, you are copying the shared weights to the local network (so basically the global variables and local variables will be the same by then), then calculating gradients on local variables, then applying the gradients to the global variables. It does not sound logic, does it?
Please correct me if I am wrong.
The text was updated successfully, but these errors were encountered: