Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Supporting arbitrary directed acyclic networks #92

Closed
aravindhm opened this issue Feb 10, 2014 · 5 comments
Closed

Supporting arbitrary directed acyclic networks #92

aravindhm opened this issue Feb 10, 2014 · 5 comments

Comments

@aravindhm
Copy link

Backpropagation can work on arbitrary directed acyclic networks. Does the current implementation support a blob being used by two different layers. I see that each layer is initializing bottom_diff to zero and then accumulating gradient into it, this would override the gradient contributed by a second layer acting on the same blob. Or am I missing something? If not, is simply changing = to += a good way of solving the problem?

@Yangqing
Copy link
Member

It does not yet. Decaf does it by detecting branches and inserting SplitLayers:

https://github.com/UCB-ICSI-Vision-Group/decaf-release/blob/master/decaf/base.py#L168

to the internal instantiation of the network, but caffe has not implemented such mechanism yet.

Note that simply changing = to += might have additional complexities (e.g. whether and who initializes gradients to zero, and requiring all existing and future layers to enforce +=), but it is definitely doable by either this or inserting utility layers.

@aravindhm
Copy link
Author

The split layer method would require more memory.
The cpu_diff() / gpu_diff() can be initialized to zero by the network (net.cpp) in the Net::Backward() function before the 'for' loop over layers begins.
Do you feel its worth enforcing the += instead of a splitlayer mechanism?

@Yangqing
Copy link
Member

I am a little bit torn, since enforcing += will require us to always to two
passes to the gradients, something that is not that kosher for most use
cases. But indeed the split layer would require more memory.

Yangqing

On Mon, Feb 10, 2014 at 3:08 PM, aravindhm notifications@github.com wrote:

The split layer method would require more memory.
The cpu_diff() / gpu_diff() can be initialized to zero by the network
(net.cpp) in the Net::Backward() function before the 'for' loop over layers
begins.
Do you feel its worth enforcing the += instead of a splitlayer mechanism?

Reply to this email directly or view it on GitHubhttps://github.com//issues/92#issuecomment-34700270
.

@aravindhm
Copy link
Author

Since most of the network won't have splits, a second pass through the gradients seems wasteful. I'll use a split layer - manually defined for simplicity.

@shelhamer
Copy link
Member

Split layers are in dev as of #129 and on their way to master.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants