Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

I think MVNLayer with across_channels = true is broken #1938

Closed
jyegerlehner opened this issue Feb 22, 2015 · 8 comments
Closed

I think MVNLayer with across_channels = true is broken #1938

jyegerlehner opened this issue Feb 22, 2015 · 8 comments

Comments

@jyegerlehner
Copy link
Contributor

I think MVNLayer is broken when across_channels: true is specified in MVNParameter, even though the unit test is passing.

In MVNLayerTest::TestForwardAcrossChannels, we create the layer parameter with:

layer_param.ParseFromString("mvn_param{across_channels: true}"); ParseFromString is returning false and layer_param.mvn_param().across_channels() is false, after the call, even though we specified true in the prototxt. So the test is running with default value of across_channels, which is false, the default value.

If instead in MVNLayerTest::TestForwardAcrossChannels we call:
CHECK(google::protobuf::TextFormat::ParseFromString("mvn_param{across_channels: true}", &layer_param ));

it succeeds, and layer_param.mvn_param().across_channels() is true. And then you will see that MVNLayerTest::TestForwardAcrossChannels fails, as the Forward_* produces NaN values.

I have no idea why layer_param.ParseFromString fails whereas `google::protobuf::TextFormat::ParseFromString' suceeds. I don't have protocol source installed and would rather debug Caffe than Protobuffers. If anyone knows please educate me.

It at first seems surprising that test TestForwardAcrossChannels succeeds even though across_channels = false when you'd think it would need to be true in order to pass the test. I think it's because the values in bottom blob are drawn from a normal distribution using GaussianFiller, and the only difference between across_channels=true vs. false, is that there are 20 values from which one is computing the mean and variance instead of 60. So you still get a zero mean and variance 1.0 in the top blob which is what the test checks for, regardless of whether you compute them for a single channel or across channels.

I'm working on this in connection with 1895. I will say trying to understand the vectorization approach using BLAS calls used to perform these ops is making my head hurt : /.

@jyegerlehner
Copy link
Contributor Author

I think 1895 now fixes this.
Edit: 1895 superceded by 1979.

@seanbell
Copy link

I have no idea why layer_param.ParseFromString fails whereas `google::protobuf::TextFormat::ParseFromString' suceeds.

This is because layer_param.ParseFromString parses serialized binary data (stored as a string), and google::protobuf::TextFormat::ParseFromString parses text-format data. In this case, layer_param.ParseFromString was the wrong function to use. The same problem was raised in #1355.

@jyegerlehner
Copy link
Contributor Author

Thanks @seanbell. As I read your comment I realized I left a bunch of MVNLayer tests still using the wrong ParseFromString. Will go fix.

@jyegerlehner
Copy link
Contributor Author

It turns out one of the tests (TestGradientMeanOnly in test_mvn_layer.cpp) isn't passing (in master branch). To see this just change this line:

layer_param.ParseFromString("mvn_param{normalize_variance: false}");

to this

CHECK(google::protobuf::TextFormat::ParseFromString("mvn_param{normalize_variance: false}", &layer_param));

and add the include

#include "google/protobuf/text_format.h"

I see the test fail. MVNLayer::Backward just copies top diff to bottom diff for the mean only case. Looks like the numerical gradient checker disagrees that is correct, because when it adds and subtracts stepsize to the input in order to approximate the derivative as delta output over delta input, the mean also changes so it ends up with somewhat different gradients than the layer, whereas the layer is assuming the mean to be a "constant" in its gradient calculation. So to get it to pass, seems like we'd need a way to leave mean_ unchanged as we do the gradient checker's two forward calls. Hmmm. Tried fiddling with stepsize and threshold to no avail. If it's obvious to anyone what the right way forward here please share.

@jyegerlehner
Copy link
Contributor Author

OK, 1979 has a resolution for the issues above.

As mentioned in previous note, fixing the ParseFromString issue revealed that MVNLayerTest::TestGradientMeanOnly was failing. My "fix" was to increase the tolerance on the gradient checker in that case, which is a hack. I think the failing test actually reveals that Backward is not exactly correct. It just copies top diff to bottom diff:

caffe_copy(temp_.count(), top_diff, bottom_diff);

Which is premised on the mean being a constant and thus not contributing to the derivative. Which isn't exactly true: as an input changes, it does have a small effect on the mean. However, for most real cases where the blobs are larger that contribution is small and it's a good enough approximation. I think. Maybe.

I could try to work out what the exact correct derivative is (go learn some matrix calculus) but: 1) not sure the (small) accuracy improvement is worth the extra runtime overhead and 2) all signs so far are that the PR is going to be ignored anyway, so not particularly motivated to invest the time. If I were to do that, I think one of you bright young people would need to logic check my thinking above. My foggy old brain is a quarter century removed from its linear algebra schooling.

@seanbell
Copy link

I think the failing test actually reveals that Backward is not exactly correct. It just copies top diff to bottom diff.

I agree that the gradient is wrong for this case. Someone should check my derivation, but I believe that the missing part of the gradient is the subtraction of the mean. If we let the top be y, bottom x, and loss L, then:
codecogseqn
Thus, for the case of mean subtraction, Forward and Backward look the same (and you can reuse the code from the forward case, changing "data" to "diff").

@jyegerlehner
Copy link
Contributor Author

Thank you @seanbell. That looks correct to me (after I realized the delta-sub-i-j is meant to be a kronecker delta).

Gradient checker thinks it works too: implemented in 1979, and reversed the hack that was allowing larger errors, and tests are passing.

@jyegerlehner
Copy link
Contributor Author

Closing this, since there is a PR that fixes this issue.

jyegerlehner added a commit to jyegerlehner/caffe that referenced this issue Aug 24, 2015
…lculation.

Gradient calculation per analysis of seanbell, found here:
BVLC#1938
jyegerlehner added a commit to jyegerlehner/caffe that referenced this issue Aug 26, 2015
Fix the MVNLayer tests so they actually test what they claim.

MVNLayer fixes: sum_multiplier_ sized correctly; backward gradient calculation.

Gradient calculation per analysis of seanbell, found here:
BVLC#1938

Fixes according to review comments.
wangyida pushed a commit to wangyida/caffe that referenced this issue Sep 14, 2015
Fix the MVNLayer tests so they actually test what they claim.

MVNLayer fixes: sum_multiplier_ sized correctly; backward gradient calculation.

Gradient calculation per analysis of seanbell, found here:
BVLC#1938

Fixes according to review comments.
wangyida pushed a commit to wangyida/caffe that referenced this issue Sep 15, 2015
Fix the MVNLayer tests so they actually test what they claim.

MVNLayer fixes: sum_multiplier_ sized correctly; backward gradient calculation.

Gradient calculation per analysis of seanbell, found here:
BVLC#1938

Fixes according to review comments.
wangyida pushed a commit to wangyida/caffe that referenced this issue Sep 15, 2015
Fix the MVNLayer tests so they actually test what they claim.

MVNLayer fixes: sum_multiplier_ sized correctly; backward gradient calculation.

Gradient calculation per analysis of seanbell, found here:
BVLC#1938

Fixes according to review comments.
wangyida pushed a commit to wangyida/caffe that referenced this issue Sep 16, 2015
Fix the MVNLayer tests so they actually test what they claim.

MVNLayer fixes: sum_multiplier_ sized correctly; backward gradient calculation.

Gradient calculation per analysis of seanbell, found here:
BVLC#1938

Fixes according to review comments.
wangyida pushed a commit to wangyida/caffe that referenced this issue Sep 22, 2015
Fix the MVNLayer tests so they actually test what they claim.

MVNLayer fixes: sum_multiplier_ sized correctly; backward gradient calculation.

Gradient calculation per analysis of seanbell, found here:
BVLC#1938

Fixes according to review comments.
acmiyaguchi pushed a commit to acmiyaguchi/caffe-fast-rcnn that referenced this issue Nov 13, 2017
Fix the MVNLayer tests so they actually test what they claim.

MVNLayer fixes: sum_multiplier_ sized correctly; backward gradient calculation.

Gradient calculation per analysis of seanbell, found here:
BVLC/caffe#1938

Fixes according to review comments.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants