Weight sharing #500

jeffdonahue · 2014-06-13T21:24:36Z

Built on top of #497 -- I pushed that to a new branch here (fix-backward-interface) and PRed this against that, and once #497 is finished and merged I will pull against dev.

This adds the ability to share parameters between layers, which has a number of applications, the canonical one perhaps being recurrent neural network (RNN) training.

To share weights between two or more layers with parameters (currently just InnerProductLayers and ConvolutionLayers), specify the same blob_name for all of these layers. (You can also name the biases with a second blob_name, as in the blobs_lr and weight_decay parameters.) You can see a very simple example of this in src/caffe/test/test_net.cpp: see the unit test named InitDiffDataSharedWeightsNet:

layers: {
  name: 'innerproduct1'
  type: INNER_PRODUCT
  inner_product_param {
    num_output: 10
    bias_term: false
    weight_filler {
      type: 'gaussian'
      std: 10
    }
  }
  blob_name: 'sharedweights'
  bottom: 'data'
  top: 'innerproduct1'
}
layers: {
  name: 'innerproduct2'
  type: INNER_PRODUCT
  inner_product_param {
    num_output: 10
    bias_term: false
    weight_filler {
      type: 'gaussian'
      std: 10
    }
  }
  blob_name: 'sharedweights'
  bottom: 'data'
  top: 'innerproduct2'
}

This means layers innerproduct1 and innerproduct2 are sharing the same set of weights as they've both specified blob_name: 'sharedweights'. And in this case they also take the same bottom blob, (data), so their outputs, top blobs innerproduct1 and innerproduct2, should be identical (so this is not actually something you'd ever want to do; I do it there just for testing purposes).

Note that in this case we specify only one blob name because we've set bias_term: false; if we didn't have bias_term: false we'd need to specify two blob_names, but probably the second one should be empty unless we actually want to share biases. (Specifying the empty string as a blob_name is equivalent to not specifying a blob_name in my implementation.)

blob_name: 'sharedweights'
blob_name: ''

The entire implementation is in Net::Init, Net::AppendParam, and Net::Update. Init figures out which layer will actually "own" the shared param (the first one to list its blob_name), and Update adds the non-owned layers' computed diffs into the diff of the owner blob, then only actually performs updates on owned blobs. Memory-wise, all shared blobs actually point to the same memory location for the parameter's data, but still have separately allocated diff blobs, as the logic to handle learning rate, weight decay, etc is still handled by the Solver (which is blissfully unaware that parameters can be shared).

Open to hearing feedback on the interface, implementation, etc. I'm not sure I'm happy with blob_name as the name of the field, I think it would be less ambiguous to use param_name or something, but would be inconsistent with the other per-parameter field blobs_lr (and actually to be consistent with that it should be blobs_name, but I strongly prefer the singular here..).

shelhamer · 2014-06-13T23:36:56Z

This adds the ability to share parameters between layers, which has a number of applications

...including siamese networks [1] as asked in #316.

[1] S. Chopra, R. Hadsell, and Y. LeCun. Learning a similarity metric discriminatively, with application to face veriﬁcation. CVPR 2005.

shelhamer · 2014-06-15T03:34:18Z

@jeffdonahue this is excellent! Thanks for taking a second pass on this extension -- the implementation this time around is elegant and uncomplicated to digest.

I'm not sure I'm happy with blob_name as the name of the field

How about "param" for the field name? This fits the same scheme as "top" and "bottom" since these all name blobs.

Two other naming suggestions are:

~~net_param_id -> param_id~~ param_id is a loop index so the current name is fine.
param_net_indices -> layer_param_indices since the pair is (layer_id, param_id)

shelhamer · 2014-06-26T19:30:04Z

(The following is purely about workflow policy -- disregard if uninterested.)

@jeffdonahue in the future, let's pull request branches with dependencies all against dev and rebase them as they're included. Since github doesn't allow re-heading open PRs, I have to replace this PR with #546 and I think the disconnect is unfortunate.

While the github interface may show spurious commits on the branches with dependencies until they are rebased, actually consulting the diff by git diff / git difftool and friends will be correct.

Sound good?

shelhamer · 2014-06-26T20:04:32Z

Replaced by #546 for merge.

shelhamer mentioned this pull request Jun 13, 2014

Generalize the network into graph of Blob nodes and Layer edges #166

Closed

weight sharing

5feceed

shelhamer mentioned this pull request Jun 26, 2014

Weight Sharing #546

Merged

shelhamer closed this Jun 26, 2014

shelhamer mentioned this pull request Jul 18, 2014

How to model a Siamese Network? #316

Closed

shelhamer mentioned this pull request Aug 7, 2014

Next: 0.9999 #880

Merged

jeffdonahue deleted the weight-sharing-clean branch March 14, 2015 20:44

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Weight sharing #500

Weight sharing #500

jeffdonahue commented Jun 13, 2014

shelhamer commented Jun 13, 2014

shelhamer commented Jun 15, 2014

shelhamer commented Jun 26, 2014

shelhamer commented Jun 26, 2014

Weight sharing #500

Weight sharing #500

Conversation

jeffdonahue commented Jun 13, 2014

shelhamer commented Jun 13, 2014

shelhamer commented Jun 15, 2014

shelhamer commented Jun 26, 2014

shelhamer commented Jun 26, 2014