-
Notifications
You must be signed in to change notification settings - Fork 18.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Within-channel LRN layer #273
Conversation
Agreed that exact replication's in the spirit of having reference architectures and examples. I don't quite understand why you introduced the layers instead of doing blob operations, but perhaps the helper layers could be otherwise useful, and like you said they do no harm. The direct, efficient way can be Caffe dev practice for the future. (Note I haven't fully reviewed this–someone else should take a look and merge.) |
My main motivation was to avoid rewriting code to sum over regions (for which the implementation, to me at least, looks pretty hairy). This is handled by an (average) PoolingLayer instead. |
The acronym for local contrast normalization is perhaps LCN. |
vector<Blob<Dtype>*> power_top_vec_; | ||
shared_ptr<EltwiseProductLayer<Dtype> > product_layer_; | ||
Blob<Dtype> product_data_input_; | ||
vector<Blob<Dtype>*> product_bottom_vec_; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems that both PowerLayer and EltwiseProductLayer are suitable to be refactored in #244.
This is distinct from LCN. This is a within-channel scoped response Le samedi 29 mars 2014, kloudkl notifications@github.com a écrit :
|
yup my bad, it is indeed not contrast normalization (e.g. https://code.google.com/p/cuda-convnet/wiki/LayerParams#Local_contrast_normalization_layer) |
Let me know if someone wants to merge this*. If/when that's going to happen, I'll first change the added IDs in the *if not, feel free to close the PR -- not going to be offended if people don't care about having a within-channel LRN in Caffe. |
I'm all for including this to polish the replication, but I don't see myself reviewing this soon. How about you set the field IDs, take a last glance at the diff with |
architecture) instead of LRN
layers-18pct) -- slightly slower (5000 iters now takes 6:57; took 6:43 previously), but slightly more accurate (exactly 82% test accuracy; got 81.65% before)
Thanks for giving the go-ahead Evan - done. |
Within-channel LRN layer
(Not sure how I accidentally submitted this without any description.)
This PR implements within-channel local
contrastresponse normalization across a square neighborhood of each input channel, a la cuda-convnet'srnorm
layer [1]. This layer is used in many of the cuda-convnet CIFAR example architectures, including our currentcifar_full
example that was based on thelayers-18pct
example in cuda-convnet, so I've updated that example to use this new layer type here. It doesn't make much of a difference -- running the full training gets to (exactly) 82% accuracy, as opposed to 81.65% the old normalization across channels was getting. It is also unfortunately slightly slower, taking 6 minutes and 57 seconds for 5000 iterations (compare to 6 minutes 43 seconds for the cross channel normalization), but I think this might make sense as we're summing over N^2 input pixels for each output, instead of N. I think it's nice to be able to reproduce these network architectures exactly though, even if it doesn't make much of a difference which type you use in practice.Because I'm not smart enough to write something along the lines of the code for the current cross-channel LRN layer [2], I basically implemented this under the hood as a sequence of 5 other layer types, including 2 new ones: the
EltwiseProductLayer
, which computes outputsz = x .* y
(excuse the MATLAB notation) on >=2 input blobs andPowerLayer
(open to suggestions on a better name..), a neuron which computesz = (alpha + beta * x) ^ gamma
for fixed values of those parameters. This implementation has a small memory penalty as it uses a few internal blobs for storing the intermediate results of each layer's computation. If somebody wanted to rewrite this later without using any "helper layers" to make it more memory efficient, they could do that.[1] https://code.google.com/p/cuda-convnet/wiki/LayerParams#Local_response_normalization_layer_(same_map)
[2] https://github.com/BVLC/caffe/blob/master/src/caffe/layers/lrn_layer.cpp