-
Notifications
You must be signed in to change notification settings - Fork 18.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Tied weights with transpose flag for InnerProduct layer #3612
Conversation
bottom_data, weight, (Dtype)0., top_data); | ||
caffe_gpu_gemm<Dtype>(CblasNoTrans, transpose_ ? CblasNoTrans : CblasTrans, | ||
M_, N_, K_, (Dtype)1., | ||
bottom_data, weight, (Dtype)0., top_data); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
remove added indent
Thanks @kashefy! This looks pretty good to me.
This shouldn't be needed -- instead the weight param should be set to the correct shape by swapping Besides that, please see the style nitpicks and squash your history to a single commit. Re testing: it would be good to have a few unit tests:
|
@jeffdonahue, thanks for the feedback. Will fix styling (travis build failed because of it) and add the unit tests. |
Great, thanks. Also just noticed you didn't change backward -- pretty sure that will need a different |
quick update: |
@@ -148,4 +265,127 @@ TYPED_TEST(InnerProductLayerTest, TestGradient) { | |||
} | |||
} | |||
|
|||
TYPED_TEST(InnerProductLayerTest, TestGradientTransposeFalse) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't this test be TestGradientTransposeTrue
(with the corresponding change from set_transpose(false)
to set_transpose(true)
)? This test is effectively a duplicate of the existing test above (TestGradient), I'd think.
And if this test is changed to be done with transpose
on, is there still anything additionally tested by TestBackwardTranspose
? I would think the combination of TestForward
with the gradient check would cover all functionality.
@kashefy the tests and code look good but see comments/nitpicks above. Once you've addressed these, please squash your history to a single commit (or, if you prefer, two commits -- one for the style fixes of existing code, and another for your added feature and tests), and I can merge this. Thanks! |
@jeffdonahue thanks for the feedback. Will go over the redundant test. The PR as it is now only adds a transpose feature to the ip layer. Tiying weights in an auto encoder doesn't work yet. If you think the transpose feature is useful on its own, I can the tying part in another PR. |
I'm not sure I understand why shared weights between an "encoder" and "decoder" layer wouldn't work in the current form. Both the shape and memory layout of the weight matrix would be the same between a normal IP layer ( I could certainly be missing something though. |
Transposing works for tied weights in an autoencoder as well. All good to go. |
eee4372
to
1954b3b
Compare
Failures are due to import errors when running python nose tests. Possible solution in #3638 |
1954b3b
to
2e4f4fd
Compare
Travis job passing now, can't really explain why. But glad it the import errors are gone now. All good to go. |
Hello @jeffdonahue, I think this is ready. The transpose worked for shared weights after all as is. |
@kashefy thanks, looks like this is almost there! But could you add a simple |
…toencoder. Arguments to matrix multiplication function are conditioned on this parameter, no actual transposing takes place. test ip gradient computation with transpose on
2e4f4fd
to
8f847fa
Compare
Hello @jeffdonahue, I've added |
Hello @jeffdonahue, do you think the current tests are sufficient? Anything else you think should go into this PR? Thanks. |
@kashefy thanks for adding the gradient check; I suppose it can't hurt much to have the backward test as it's presumably very quick (relative to the full gradient check). LGTM -- thanks for this work. |
Tied weights with transpose flag for InnerProduct layer
Tied weights with transpose flag for InnerProduct layer
I wanted to train an autoencoder where the deocder uses the tranpose of the encoder's weight matrix. This is first discussed in #670 and followed up in #1211 (comment). But it seemed this wasn't resolved. I found @jeffdonahue 's suggestion in this comment to just add a transpose flag to the InnerProduct layer quite reasonable.
This PR adds a transpose flag to the InnerProduct layer as well as its params protobuf message.
When set to true for the deocder, in the forward pass, the call to the matrix multiplication routine is instructed to NOT transpose the weight matrix. Which is what you want in the usual case and for the encoder.
Tying the weights between encoder and decoder requires:
A sample trainval.prototxt to demonstrate usage.
I haven't written unit tests around this yet. Open to suggestions to what makes sense to test for here.
Thanks for reviewing and looking forward to the feedback.