Tied weights with transpose flag for InnerProduct layer #3612

kashefy · 2016-01-29T18:55:41Z

I wanted to train an autoencoder where the deocder uses the tranpose of the encoder's weight matrix. This is first discussed in #670 and followed up in #1211 (comment). But it seemed this wasn't resolved. I found @jeffdonahue 's suggestion in this comment to just add a transpose flag to the InnerProduct layer quite reasonable.

This PR adds a transpose flag to the InnerProduct layer as well as its params protobuf message.
When set to true for the deocder, in the forward pass, the call to the matrix multiplication routine is instructed to NOT transpose the weight matrix. Which is what you want in the usual case and for the encoder.

Tying the weights between encoder and decoder requires:

sharing the encoder's weight params with the inner product layer that is the decoder.
setting the share_mode to true in both ip layers (otherwise it won't allow for the shape mismatch)
Adding 'transpose: true' to the decoder's inner_product_param

A sample trainval.prototxt to demonstrate usage.

I haven't written unit tests around this yet. Open to suggestions to what makes sense to test for here.

Thanks for reviewing and looking forward to the feedback.

jeffdonahue · 2016-01-29T19:06:35Z

src/caffe/layers/inner_product_layer.cu

-                          bottom_data, weight, (Dtype)0., top_data);
+      caffe_gpu_gemm<Dtype>(CblasNoTrans, transpose_ ? CblasNoTrans : CblasTrans,
+                            M_, N_, K_, (Dtype)1.,
+                            bottom_data, weight, (Dtype)0., top_data);


remove added indent

jeffdonahue · 2016-01-29T19:15:22Z

Thanks @kashefy! This looks pretty good to me.

setting the share_mode to true in both ip layers (otherwise it won't allow for the shape mismatch)

This shouldn't be needed -- instead the weight param should be set to the correct shape by swapping N_ & K_, changing lines 32-33 of inner_product_layer.cpp to be conditioned on transpose.

Besides that, please see the style nitpicks and squash your history to a single commit.

Re testing: it would be good to have a few unit tests:

verify the correct shape of the parameter with and without transpose set
a gradient check with transpose set
a forward check, for example: initialize an IP layer without transpose and the parameter randomly initialized, run Forward, save the result; initialize another IP layer with transpose, manually copy and transpose the value of the parameter from the first IP layer, then run Forward on the same input and check that the result is the same

kashefy · 2016-01-29T21:44:14Z

@jeffdonahue, thanks for the feedback. Will fix styling (travis build failed because of it) and add the unit tests.

jeffdonahue · 2016-01-30T00:43:52Z

Great, thanks. Also just noticed you didn't change backward -- pretty sure that will need a different CBlasTrans setting as well. (But no need to think about it once you write the gradient check :)

kashefy · 2016-02-03T17:54:47Z

quick update:
Wrote unit tests around forward, backward, blob shape (with and without transpose). Tests pass. Fixed styling.
Transposing shared weights needs more work. The transposing ip stores the weight shape in the post-transpose form but the transposing doesn't happen until the multiplication is called. However setting the num_outputs of the decoder collides with these assumptions.
Maybe I need a second flag next to the transpose one to hold off on switching the weight shapes in case the weights are tied with another 'encoder' layer.

jeffdonahue · 2016-02-03T21:53:11Z

src/caffe/test/test_inner_product_layer.cpp

@@ -148,4 +265,127 @@ TYPED_TEST(InnerProductLayerTest, TestGradient) {
  }
 }

+TYPED_TEST(InnerProductLayerTest, TestGradientTransposeFalse) {


Shouldn't this test be TestGradientTransposeTrue (with the corresponding change from set_transpose(false) to set_transpose(true))? This test is effectively a duplicate of the existing test above (TestGradient), I'd think.

And if this test is changed to be done with transpose on, is there still anything additionally tested by TestBackwardTranspose? I would think the combination of TestForward with the gradient check would cover all functionality.

jeffdonahue · 2016-02-03T21:57:55Z

@kashefy the tests and code look good but see comments/nitpicks above. Once you've addressed these, please squash your history to a single commit (or, if you prefer, two commits -- one for the style fixes of existing code, and another for your added feature and tests), and I can merge this. Thanks!

kashefy · 2016-02-03T22:14:18Z

@jeffdonahue thanks for the feedback. Will go over the redundant test. The PR as it is now only adds a transpose feature to the ip layer. Tiying weights in an auto encoder doesn't work yet. If you think the transpose feature is useful on its own, I can the tying part in another PR.

jeffdonahue · 2016-02-03T22:31:55Z

I'm not sure I understand why shared weights between an "encoder" and "decoder" layer wouldn't work in the current form. Both the shape and memory layout of the weight matrix would be the same between a normal IP layer (transpose = false) that takes D-dimensional input and produces N-dimensional output, and a transposed IP layer (transpose = true) that takes N-dimensional input and produces D-dimensional output. Given that, I would think the only other thing that should need to be done in the transpose=true case (and which you have done here) is to change the BLAS transpose settings in forward/backward when reading from/writing to the weights.

I could certainly be missing something though.

kashefy · 2016-02-03T22:49:23Z

Transposing works for tied weights in an autoencoder as well. All good to go.

kashefy · 2016-02-05T13:23:26Z

Failures are due to import errors when running python nose tests. Possible solution in #3638

kashefy · 2016-02-08T10:43:38Z

Travis job passing now, can't really explain why. But glad it the import errors are gone now. All good to go.

kashefy · 2016-02-17T17:43:17Z

Hello @jeffdonahue, I think this is ready. The transpose worked for shared weights after all as is.

jeffdonahue · 2016-02-18T03:21:10Z

@kashefy thanks, looks like this is almost there! But could you add a simple TestGradientTranspose test? It should be exactly the same as the existing TestGradient but of course have one extra line that does set_transpose(true). And with that test added I'm inclined to say TestBackwardTranspose should be removed, unless you think there is something additionally tested in that which isn't covered by the gradient checker.

…toencoder. Arguments to matrix multiplication function are conditioned on this parameter, no actual transposing takes place. test ip gradient computation with transpose on

kashefy · 2016-02-20T15:11:10Z

Hello @jeffdonahue, I've added TestGradientTranspose (bascially TestGradient + set transpose to true, as you suggested). You're right, TestBackwardTranspose is somewhat redundant. It's not covering anything that the gradient checker isn't already covering. However, I find it to be helpful in narrowing down where a failure could come from. The test was actually very helpful in setting up the backward computation, so I'm a bit reluctant in throwing it out. I tend to use tests as a debug aid, so I usually end up writing more to better understand where a failure is coming from at the expense of increasing redundancy and length of code.

kashefy · 2016-02-25T17:43:47Z

Hello @jeffdonahue, do you think the current tests are sufficient? Anything else you think should go into this PR? Thanks.

jeffdonahue · 2016-02-25T19:17:16Z

@kashefy thanks for adding the gradient check; I suppose it can't hurt much to have the backward test as it's presumably very quick (relative to the full gradient check). LGTM -- thanks for this work.

Tied weights with transpose flag for InnerProduct layer

jeffdonahue reviewed Jan 29, 2016
View reviewed changes

jeffdonahue reviewed Feb 3, 2016
View reviewed changes

kashefy force-pushed the tied_weights_ip_transpose branch 3 times, most recently from eee4372 to 1954b3b Compare February 5, 2016 11:20

kashefy force-pushed the tied_weights_ip_transpose branch from 1954b3b to 2e4f4fd Compare February 8, 2016 08:23

tranpose parameter added to IP layer to support tied weights in an au…

8f847fa

…toencoder. Arguments to matrix multiplication function are conditioned on this parameter, no actual transposing takes place. test ip gradient computation with transpose on

kashefy force-pushed the tied_weights_ip_transpose branch from 2e4f4fd to 8f847fa Compare February 20, 2016 14:40

jeffdonahue added a commit that referenced this pull request Feb 25, 2016

Merge pull request #3612 from kashefy/tied_weights_ip_transpose

fe0f441

Tied weights with transpose flag for InnerProduct layer

jeffdonahue merged commit fe0f441 into BVLC:master Feb 25, 2016

fxbit pushed a commit to Yodigram/caffe that referenced this pull request Sep 1, 2016

Merge pull request BVLC#3612 from kashefy/tied_weights_ip_transpose

b47c249

Tied weights with transpose flag for InnerProduct layer

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Tied weights with transpose flag for InnerProduct layer #3612

Tied weights with transpose flag for InnerProduct layer #3612

kashefy commented Jan 29, 2016

jeffdonahue Jan 29, 2016

jeffdonahue commented Jan 29, 2016

kashefy commented Jan 29, 2016

jeffdonahue commented Jan 30, 2016

kashefy commented Feb 3, 2016

jeffdonahue Feb 3, 2016

jeffdonahue commented Feb 3, 2016

kashefy commented Feb 3, 2016

jeffdonahue commented Feb 3, 2016

kashefy commented Feb 3, 2016

kashefy commented Feb 5, 2016

kashefy commented Feb 8, 2016

kashefy commented Feb 17, 2016

jeffdonahue commented Feb 18, 2016

kashefy commented Feb 20, 2016

kashefy commented Feb 25, 2016

jeffdonahue commented Feb 25, 2016

Tied weights with transpose flag for InnerProduct layer #3612

Tied weights with transpose flag for InnerProduct layer #3612

Conversation

kashefy commented Jan 29, 2016

jeffdonahue Jan 29, 2016

Choose a reason for hiding this comment

jeffdonahue commented Jan 29, 2016

kashefy commented Jan 29, 2016

jeffdonahue commented Jan 30, 2016

kashefy commented Feb 3, 2016

jeffdonahue Feb 3, 2016

Choose a reason for hiding this comment

jeffdonahue commented Feb 3, 2016

kashefy commented Feb 3, 2016

jeffdonahue commented Feb 3, 2016

kashefy commented Feb 3, 2016

kashefy commented Feb 5, 2016

kashefy commented Feb 8, 2016

kashefy commented Feb 17, 2016

jeffdonahue commented Feb 18, 2016

kashefy commented Feb 20, 2016

kashefy commented Feb 25, 2016

jeffdonahue commented Feb 25, 2016