-
Notifications
You must be signed in to change notification settings - Fork 18.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
I think MVNLayer with across_channels = true is broken #1938
Comments
This is because |
Thanks @seanbell. As I read your comment I realized I left a bunch of MVNLayer tests still using the wrong ParseFromString. Will go fix. |
It turns out one of the tests ( layer_param.ParseFromString("mvn_param{normalize_variance: false}"); to this CHECK(google::protobuf::TextFormat::ParseFromString("mvn_param{normalize_variance: false}", &layer_param)); and add the include #include "google/protobuf/text_format.h" I see the test fail. MVNLayer::Backward just copies top diff to bottom diff for the mean only case. Looks like the numerical gradient checker disagrees that is correct, because when it adds and subtracts stepsize to the input in order to approximate the derivative as delta output over delta input, the mean also changes so it ends up with somewhat different gradients than the layer, whereas the layer is assuming the mean to be a "constant" in its gradient calculation. So to get it to pass, seems like we'd need a way to leave mean_ unchanged as we do the gradient checker's two forward calls. Hmmm. Tried fiddling with stepsize and threshold to no avail. If it's obvious to anyone what the right way forward here please share. |
OK, 1979 has a resolution for the issues above. As mentioned in previous note, fixing the ParseFromString issue revealed that MVNLayerTest::TestGradientMeanOnly was failing. My "fix" was to increase the tolerance on the gradient checker in that case, which is a hack. I think the failing test actually reveals that Backward is not exactly correct. It just copies top diff to bottom diff: caffe_copy(temp_.count(), top_diff, bottom_diff); Which is premised on the mean being a constant and thus not contributing to the derivative. Which isn't exactly true: as an input changes, it does have a small effect on the mean. However, for most real cases where the blobs are larger that contribution is small and it's a good enough approximation. I think. Maybe. I could try to work out what the exact correct derivative is (go learn some matrix calculus) but: 1) not sure the (small) accuracy improvement is worth the extra runtime overhead and 2) all signs so far are that the PR is going to be ignored anyway, so not particularly motivated to invest the time. If I were to do that, I think one of you bright young people would need to logic check my thinking above. My foggy old brain is a quarter century removed from its linear algebra schooling. |
Closing this, since there is a PR that fixes this issue. |
…lculation. Gradient calculation per analysis of seanbell, found here: BVLC#1938
Fix the MVNLayer tests so they actually test what they claim. MVNLayer fixes: sum_multiplier_ sized correctly; backward gradient calculation. Gradient calculation per analysis of seanbell, found here: BVLC#1938 Fixes according to review comments.
Fix the MVNLayer tests so they actually test what they claim. MVNLayer fixes: sum_multiplier_ sized correctly; backward gradient calculation. Gradient calculation per analysis of seanbell, found here: BVLC#1938 Fixes according to review comments.
Fix the MVNLayer tests so they actually test what they claim. MVNLayer fixes: sum_multiplier_ sized correctly; backward gradient calculation. Gradient calculation per analysis of seanbell, found here: BVLC#1938 Fixes according to review comments.
Fix the MVNLayer tests so they actually test what they claim. MVNLayer fixes: sum_multiplier_ sized correctly; backward gradient calculation. Gradient calculation per analysis of seanbell, found here: BVLC#1938 Fixes according to review comments.
Fix the MVNLayer tests so they actually test what they claim. MVNLayer fixes: sum_multiplier_ sized correctly; backward gradient calculation. Gradient calculation per analysis of seanbell, found here: BVLC#1938 Fixes according to review comments.
Fix the MVNLayer tests so they actually test what they claim. MVNLayer fixes: sum_multiplier_ sized correctly; backward gradient calculation. Gradient calculation per analysis of seanbell, found here: BVLC#1938 Fixes according to review comments.
Fix the MVNLayer tests so they actually test what they claim. MVNLayer fixes: sum_multiplier_ sized correctly; backward gradient calculation. Gradient calculation per analysis of seanbell, found here: BVLC/caffe#1938 Fixes according to review comments.
I think MVNLayer is broken when
across_channels: true
is specified in MVNParameter, even though the unit test is passing.In
MVNLayerTest::TestForwardAcrossChannels
, we create the layer parameter with:layer_param.ParseFromString("mvn_param{across_channels: true}");
ParseFromString is returning false andlayer_param.mvn_param().across_channels()
is false, after the call, even though we specifiedtrue
in the prototxt. So the test is running with default value of across_channels, which isfalse
, the default value.If instead in
MVNLayerTest::TestForwardAcrossChannels
we call:CHECK(google::protobuf::TextFormat::ParseFromString("mvn_param{across_channels: true}", &layer_param ));
it succeeds, and
layer_param.mvn_param().across_channels()
is true. And then you will see thatMVNLayerTest::TestForwardAcrossChannels
fails, as the Forward_* produces NaN values.I have no idea why
layer_param.ParseFromString
fails whereas `google::protobuf::TextFormat::ParseFromString' suceeds. I don't have protocol source installed and would rather debug Caffe than Protobuffers. If anyone knows please educate me.It at first seems surprising that test
TestForwardAcrossChannels
succeeds even though across_channels = false when you'd think it would need to be true in order to pass the test. I think it's because the values in bottom blob are drawn from a normal distribution using GaussianFiller, and the only difference between across_channels=true vs. false, is that there are 20 values from which one is computing the mean and variance instead of 60. So you still get a zero mean and variance 1.0 in the top blob which is what the test checks for, regardless of whether you compute them for a single channel or across channels.I'm working on this in connection with 1895. I will say trying to understand the vectorization approach using BLAS calls used to perform these ops is making my head hurt : /.
The text was updated successfully, but these errors were encountered: