Supporting arbitrary directed acyclic networks #92

aravindhm · 2014-02-10T22:46:26Z

Backpropagation can work on arbitrary directed acyclic networks. Does the current implementation support a blob being used by two different layers. I see that each layer is initializing bottom_diff to zero and then accumulating gradient into it, this would override the gradient contributed by a second layer acting on the same blob. Or am I missing something? If not, is simply changing = to += a good way of solving the problem?

Yangqing · 2014-02-10T22:52:40Z

It does not yet. Decaf does it by detecting branches and inserting SplitLayers:

https://github.com/UCB-ICSI-Vision-Group/decaf-release/blob/master/decaf/base.py#L168

to the internal instantiation of the network, but caffe has not implemented such mechanism yet.

Note that simply changing = to += might have additional complexities (e.g. whether and who initializes gradients to zero, and requiring all existing and future layers to enforce +=), but it is definitely doable by either this or inserting utility layers.

aravindhm · 2014-02-10T23:08:32Z

The split layer method would require more memory.
The cpu_diff() / gpu_diff() can be initialized to zero by the network (net.cpp) in the Net::Backward() function before the 'for' loop over layers begins.
Do you feel its worth enforcing the += instead of a splitlayer mechanism?

Yangqing · 2014-02-10T23:11:34Z

I am a little bit torn, since enforcing += will require us to always to two
passes to the gradients, something that is not that kosher for most use
cases. But indeed the split layer would require more memory.

Yangqing

On Mon, Feb 10, 2014 at 3:08 PM, aravindhm notifications@github.com wrote:

The split layer method would require more memory.
The cpu_diff() / gpu_diff() can be initialized to zero by the network
(net.cpp) in the Net::Backward() function before the 'for' loop over layers
begins.
Do you feel its worth enforcing the += instead of a splitlayer mechanism?

Reply to this email directly or view it on GitHubhttps://github.com//issues/92#issuecomment-34700270
.

aravindhm · 2014-02-10T23:44:45Z

Since most of the network won't have splits, a second pass through the gradients seems wasteful. I'll use a split layer - manually defined for simplicity.

shelhamer · 2014-02-25T01:25:09Z

Split layers are in dev as of #129 and on their way to master.

shelhamer added the enhancement label Feb 15, 2014

jeffdonahue mentioned this issue Feb 15, 2014

Support arbitrary DAGs with SplitLayer #114

Closed

shelhamer closed this as completed Feb 25, 2014

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Supporting arbitrary directed acyclic networks #92

Supporting arbitrary directed acyclic networks #92

aravindhm commented Feb 10, 2014

Yangqing commented Feb 10, 2014

aravindhm commented Feb 10, 2014

Yangqing commented Feb 10, 2014

aravindhm commented Feb 10, 2014

shelhamer commented Feb 25, 2014

Supporting arbitrary directed acyclic networks #92

Supporting arbitrary directed acyclic networks #92

Comments

aravindhm commented Feb 10, 2014

Yangqing commented Feb 10, 2014

aravindhm commented Feb 10, 2014

Yangqing commented Feb 10, 2014

aravindhm commented Feb 10, 2014

shelhamer commented Feb 25, 2014