Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cannot explicitly "name" BatchNorm parameters for sharing (Siamese network) #5171

Closed
shaibagon opened this issue Jan 11, 2017 · 4 comments
Closed

Comments

@shaibagon
Copy link
Member

shaibagon commented Jan 11, 2017

TL;DR

It seems like commit a8ec123c00723df0d0ad897e1eea32a29201c81b broke something here: one can no longer pass param to "BatchNorm" layer (they are ignored).

I suppose auto-upgrade should not remove param completely from "BatchNorm" layers, but rather make sure lr_mult and decay_mult are zero.

Issue summary

I am trying to share BatchNorm internal parameters in a Siamese network: That is, having two BatchNorm layers sharing the same internal parameters.

I explicitly name the parameters using: param {'lr_mult':0 'name': 'bn_a'}

However, when building the net, the whole param section is missing, and caffe completely ignores the name of the parameters. No sharing is actually happening.

Steps to reproduce

Here is the relevant part of the prototxt:

layer {
  name: "bn_a"
  type: "BatchNorm"
  bottom: "conv1_a"
  top: "bn_a"
  param {
    name: "bn_m"
    lr_mult: 0
  }
  param {
    name: "bn_v"
    lr_mult: 0
  }
  param {
    name: "bn_b"
    lr_mult: 0
  }
  batch_norm_param {
    use_global_stats: false
  }
}
layer {
  name: "bn_b"
  type: "BatchNorm"
  bottom: "conv1_b"
  top: "bn_b"
  param {
    name: "bn_m"
    lr_mult: 0
  }
  param {
    name: "bn_v"
    lr_mult: 0
  }
  param {
    name: "bn_b"
    lr_mult: 0
  }
  batch_norm_param {
    use_global_stats: false
  }
}

The relevant part of the log:

layer {
  name: "bn_a"
  type: "BatchNorm"
  bottom: "conv1_a"
  top: "bn_a"
  batch_norm_param {
    use_global_stats: false
  }
}
layer {
  name: "bn_b"
  type: "BatchNorm"
  bottom: "conv1_b"
  top: "bn_b"
  batch_norm_param {
    use_global_stats: false
  }
}

Consequently, when building the layers the log reports:

I0111 22:28:11.185611 27369 layer_factory.hpp:77] Creating layer bn_b
I0111 22:28:11.185622 27369 net.cpp:100] Creating Layer bn_b
I0111 22:28:11.185631 27369 net.cpp:434] bn_b <- conv1_b
I0111 22:28:11.185642 27369 net.cpp:408] bn_b -> bn_b
I0111 22:28:11.185780 27369 net.cpp:150] Setting up bn_b
I0111 22:28:11.185803 27369 net.cpp:157] Top shape: 10 10 222 222 (4928400)
I0111 22:28:11.185813 27369 net.cpp:165] Memory required for data: 90896640

There is no
Sharing parameters 'bn_m' owned by layer 'bn1_a', param index 0
log entries as I would have expected.

It seems like BatchNorm ignores param entries completely!

Am I missing something?
How can I share the internal parameters of BatchNorm?

Your system configuration

Operating system: ubuntu 14.04
Compiler: gcc
CUDA version (if applicable):
CUDNN version (if applicable):
BLAS: mkl
Python or MATLAB version (for pycaffe and matcaffe respectively): python 2.7
Caffe: my caffe is updated with commit 365ac88

shaibagon referenced this issue Jan 12, 2017
batch norm statistics are not learnable parameters subject to solver
updates, so they must be shielded from the solver. `BatchNorm` layer now
masks its statistics for itself by zeroing parameter learning rates
instead of relying on the layer definition.

n.b. declaring `param`s for batch norm layers is no longer allowed.
shaibagon referenced this issue Jan 12, 2017
automatically strip old batch norm layer definitions including `param`
messages. the batch norm layer used to require manually masking its
state from the solver by setting `param { lr_mult: 0 }` messages for
each of its statistics. this is now handled automatically by the layer.
@D-X-Y
Copy link

D-X-Y commented Jan 12, 2017

I have the same question as #5120 ..

@shaibagon
Copy link
Member Author

@D-X-Y I made an attempt at fixing this issue: PR#5184. Hope they'll approve.

@shelhamer
Copy link
Member

Thanks for raising this issue. I was too focused on protecting the batch statistics from accidental gradients that I made sharing impossible across layers that could sensibly share statistics as in siamese nets. I'll double-check #5184 and come up with an auto-upgrade rule that handles both lr and name.

@shelhamer
Copy link
Member

Fixed by #5184

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants