-
Notifications
You must be signed in to change notification settings - Fork 18.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Supporting arbitrary directed acyclic networks #92
Comments
It does not yet. Decaf does it by detecting branches and inserting SplitLayers: https://github.com/UCB-ICSI-Vision-Group/decaf-release/blob/master/decaf/base.py#L168 to the internal instantiation of the network, but caffe has not implemented such mechanism yet. Note that simply changing = to += might have additional complexities (e.g. whether and who initializes gradients to zero, and requiring all existing and future layers to enforce +=), but it is definitely doable by either this or inserting utility layers. |
The split layer method would require more memory. |
I am a little bit torn, since enforcing += will require us to always to two Yangqing On Mon, Feb 10, 2014 at 3:08 PM, aravindhm notifications@github.com wrote:
|
Since most of the network won't have splits, a second pass through the gradients seems wasteful. I'll use a split layer - manually defined for simplicity. |
Split layers are in |
Backpropagation can work on arbitrary directed acyclic networks. Does the current implementation support a blob being used by two different layers. I see that each layer is initializing bottom_diff to zero and then accumulating gradient into it, this would override the gradient contributed by a second layer acting on the same blob. Or am I missing something? If not, is simply changing = to += a good way of solving the problem?
The text was updated successfully, but these errors were encountered: