Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Built on top of #497 -- I pushed that to a new branch here (
fix-backward-interface
) and PRed this against that, and once #497 is finished and merged I will pull against dev.This adds the ability to share parameters between layers, which has a number of applications, the canonical one perhaps being recurrent neural network (RNN) training.
To share weights between two or more layers with parameters (currently just
InnerProductLayer
s andConvolutionLayer
s), specify the sameblob_name
for all of these layers. (You can also name the biases with a secondblob_name
, as in theblobs_lr
andweight_decay
parameters.) You can see a very simple example of this insrc/caffe/test/test_net.cpp
: see the unit test namedInitDiffDataSharedWeightsNet
:This means layers innerproduct1 and innerproduct2 are sharing the same set of weights as they've both specified
blob_name: 'sharedweights'
. And in this case they also take the same bottom blob, (data
), so their outputs, top blobsinnerproduct1
andinnerproduct2
, should be identical (so this is not actually something you'd ever want to do; I do it there just for testing purposes).Note that in this case we specify only one blob name because we've set
bias_term: false
; if we didn't havebias_term: false
we'd need to specify twoblob_name
s, but probably the second one should be empty unless we actually want to share biases. (Specifying the empty string as ablob_name
is equivalent to not specifying ablob_name
in my implementation.)The entire implementation is in
Net::Init
,Net::AppendParam
, andNet::Update
.Init
figures out which layer will actually "own" the shared param (the first one to list itsblob_name
), andUpdate
adds the non-owned layers' computed diffs into the diff of the owner blob, then only actually performs updates on owned blobs. Memory-wise, all shared blobs actually point to the same memory location for the parameter's data, but still have separately allocated diff blobs, as the logic to handle learning rate, weight decay, etc is still handled by the Solver (which is blissfully unaware that parameters can be shared).Open to hearing feedback on the interface, implementation, etc. I'm not sure I'm happy with
blob_name
as the name of the field, I think it would be less ambiguous to useparam_name
or something, but would be inconsistent with the other per-parameter fieldblobs_lr
(and actually to be consistent with that it should beblobs_name
, but I strongly prefer the singular here..).