Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Weight sharing #500

Conversation

jeffdonahue
Copy link
Contributor

Built on top of #497 -- I pushed that to a new branch here (fix-backward-interface) and PRed this against that, and once #497 is finished and merged I will pull against dev.

This adds the ability to share parameters between layers, which has a number of applications, the canonical one perhaps being recurrent neural network (RNN) training.

To share weights between two or more layers with parameters (currently just InnerProductLayers and ConvolutionLayers), specify the same blob_name for all of these layers. (You can also name the biases with a second blob_name, as in the blobs_lr and weight_decay parameters.) You can see a very simple example of this in src/caffe/test/test_net.cpp: see the unit test named InitDiffDataSharedWeightsNet:

layers: {
  name: 'innerproduct1'
  type: INNER_PRODUCT
  inner_product_param {
    num_output: 10
    bias_term: false
    weight_filler {
      type: 'gaussian'
      std: 10
    }
  }
  blob_name: 'sharedweights'
  bottom: 'data'
  top: 'innerproduct1'
}
layers: {
  name: 'innerproduct2'
  type: INNER_PRODUCT
  inner_product_param {
    num_output: 10
    bias_term: false
    weight_filler {
      type: 'gaussian'
      std: 10
    }
  }
  blob_name: 'sharedweights'
  bottom: 'data'
  top: 'innerproduct2'
}

This means layers innerproduct1 and innerproduct2 are sharing the same set of weights as they've both specified blob_name: 'sharedweights'. And in this case they also take the same bottom blob, (data), so their outputs, top blobs innerproduct1 and innerproduct2, should be identical (so this is not actually something you'd ever want to do; I do it there just for testing purposes).

Note that in this case we specify only one blob name because we've set bias_term: false; if we didn't have bias_term: false we'd need to specify two blob_names, but probably the second one should be empty unless we actually want to share biases. (Specifying the empty string as a blob_name is equivalent to not specifying a blob_name in my implementation.)

blob_name: 'sharedweights'
blob_name: ''

The entire implementation is in Net::Init, Net::AppendParam, and Net::Update. Init figures out which layer will actually "own" the shared param (the first one to list its blob_name), and Update adds the non-owned layers' computed diffs into the diff of the owner blob, then only actually performs updates on owned blobs. Memory-wise, all shared blobs actually point to the same memory location for the parameter's data, but still have separately allocated diff blobs, as the logic to handle learning rate, weight decay, etc is still handled by the Solver (which is blissfully unaware that parameters can be shared).

Open to hearing feedback on the interface, implementation, etc. I'm not sure I'm happy with blob_name as the name of the field, I think it would be less ambiguous to use param_name or something, but would be inconsistent with the other per-parameter field blobs_lr (and actually to be consistent with that it should be blobs_name, but I strongly prefer the singular here..).

@shelhamer
Copy link
Member

This adds the ability to share parameters between layers, which has a number of applications

...including siamese networks [1] as asked in #316.

[1] S. Chopra, R. Hadsell, and Y. LeCun. Learning a similarity metric discriminatively, with application to face verification. CVPR 2005.

@shelhamer
Copy link
Member

@jeffdonahue this is excellent! Thanks for taking a second pass on this extension -- the implementation this time around is elegant and uncomplicated to digest.

I'm not sure I'm happy with blob_name as the name of the field

How about "param" for the field name? This fits the same scheme as "top" and "bottom" since these all name blobs.

Two other naming suggestions are:

  • net_param_id -> param_id param_id is a loop index so the current name is fine.
  • param_net_indices -> layer_param_indices since the pair is (layer_id, param_id)

@shelhamer shelhamer mentioned this pull request Jun 26, 2014
@shelhamer
Copy link
Member

(The following is purely about workflow policy -- disregard if uninterested.)

@jeffdonahue in the future, let's pull request branches with dependencies all against dev and rebase them as they're included. Since github doesn't allow re-heading open PRs, I have to replace this PR with #546 and I think the disconnect is unfortunate.

While the github interface may show spurious commits on the branches with dependencies until they are rebased, actually consulting the diff by git diff / git difftool and friends will be correct.

Sound good?

@shelhamer
Copy link
Member

Replaced by #546 for merge.

@shelhamer shelhamer closed this Jun 26, 2014
@shelhamer shelhamer mentioned this pull request Aug 7, 2014
@jeffdonahue jeffdonahue deleted the weight-sharing-clean branch March 14, 2015 20:44
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants