Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Leaky ReLU #740

Merged
merged 2 commits into from
Jul 22, 2014
Merged

Leaky ReLU #740

merged 2 commits into from
Jul 22, 2014

Conversation

qipeng
Copy link
Contributor

@qipeng qipeng commented Jul 19, 2014

Implemented the Leaky ReLU unit described in this paper

Maas, Andrew L., Awni Y. Hannun, and Andrew Y. Ng. "Rectifier nonlinearities improve neural network acoustic models." ICML Workshop on Deep Learning for Audio, Speech, and Language Processing. 2013.

which shares similar sparse activation properties with the ReLU, but was shown easier to optimize.

@qipeng qipeng mentioned this pull request Jul 19, 2014
typedef typename TypeParam::Dtype Dtype;
LayerParameter layer_param;
ReLULayer<Dtype> layer(layer_param);
layer_param.ParseFromString("relu_param{negative_slope:0.01}");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the value of the relu_param isn't actually being used here because you define it after the layer is constructed (in the line before, 84).

@jeffdonahue
Copy link
Contributor

Looks good, thanks @qipeng! See my nitpicky comments though.

I just realized this can also be used as an absolute value neuron, with negative_slope = -1.

@qipeng
Copy link
Contributor Author

qipeng commented Jul 20, 2014

@jeffdonahue Many thanks for the helpful comments!

I've made changes accordingly and fixed the bug in the unit test code.

@jeffdonahue
Copy link
Contributor

Since I was the one that made @qipeng go back and merge this into the ReLU layer, I wanted to do some benchmarking before merging this, to make sure the cost to architectures using the existing ReLU layer wasn't too high.

I ran net_speed_benchmark for 50 iters on the GPU with imagenet_train.prototxt. I wrote a quick python script to parse out just the ReLU timings and add them up:

[/home/jdonahue/dev 4181]$ ./compare_benchmark.py
/usr/bin/python: /home/jdonahue/anaconda/lib/libcrypto.so.1.0.0: no version information available (required by /usr/bin/python)
/usr/bin/python: /home/jdonahue/anaconda/lib/libssl.so.1.0.0: no version information available (required by /usr/bin/python)
relu1_backward:: 260.350000 (old ReLU) 260.097000 (new LReLU)
relu1_forward:: 208.607000 (old ReLU) 210.056000 (new LReLU)
relu2_backward:: 167.445000 (old ReLU) 167.241000 (new LReLU)
relu2_forward:: 134.078000 (old ReLU) 135.129000 (new LReLU)
relu3_backward:: 58.336600 (old ReLU) 58.307600 (new LReLU)
relu3_forward:: 46.687700 (old ReLU) 47.014100 (new LReLU)
relu4_backward:: 58.402200 (old ReLU) 58.353100 (new LReLU)
relu4_forward:: 46.765100 (old ReLU) 47.048400 (new LReLU)
relu5_backward:: 38.929900 (old ReLU) 38.881000 (new LReLU)
relu5_forward:: 31.230400 (old ReLU) 31.426800 (new LReLU)
relu6_backward:: 3.834750 (old ReLU) 3.866850 (new LReLU)
relu6_forward:: 3.055140 (old ReLU) 3.089760 (new LReLU)
relu7_backward:: 3.839100 (old ReLU) 3.877890 (new LReLU)
relu7_forward:: 3.053630 (old ReLU) 3.092320 (new LReLU)
Old time: 1064.61452
New time: 1067.48082

(All above times are in milliseconds.) So by those numbers this has incurs a performance hit of about 0.27% for the ReLU layer. The total benchmark run time for the full ImageNet architecture was around 76530ms, so the performance hit for the full ImageNet architecture is about 0.004%. This is a pretty small cost -- do we care? (If so, I guess we'd have to re-split this into a separate LReLU layer...sorry for the inconvenience @qipeng; I can redo the splitting for you if you'd like and this turns out to be our decision.)

@qipeng
Copy link
Contributor Author

qipeng commented Jul 21, 2014

@jeffdonahue Thanks for the tests and insightful comments!

Just to check, are you using the default setting for the new LReLU (which sets negative_slope to zero) to behave like a ReLU? Also, are these experiments conducted with -O2?

If the answers are yes to both of them, then -O2 is weaker than I previously thought (it didn't get rid of the multiplication and addition by zero.

Actually I'm not too familiar with how caffe works: does it do any type of just-in-time (JIT) compilation? I.e. does the compilation happen before or after it knew negative_slope was zero? If it's not JIT then this performance should be totally acceptable because we are at least talking about an additional multiplication (and one addition). If it is JIT, then -O2 is weaker than I thought and if others in the community feel like this is too much of a issue I'll split them again. (but seriously 0.27% means only 4 extra minutes for every day in the process...)

@jeffdonahue
Copy link
Contributor

There is no JIT compilation, it's all a single thread so as you say it's not surprising that the extra multiplication and addition incurs some cost. The impact on extra time per day is a good way to look at it, but I think the percentage I'd actually look at is the total cost in ImageNet training, which is an even more negligible 0.004% or ~3 seconds per day. I agree this is quite negligible and will merge this once it's rebased unless another Caffe dev says otherwise.

@qipeng, please rebase this once more and comment when done. If it passes Travis after rebase I'll merge immediately so you won't have to worry about it anymore (and if anything else gets merged before that causes more conflicts, I'll redo the rebase myself). Thanks again!

@qipeng
Copy link
Contributor Author

qipeng commented Jul 21, 2014

Hi @jeffdonahue , I've just done rebasing and Travis CI seems passing. Let me know if any last-minute changes are needed! :)

@jeffdonahue
Copy link
Contributor

Hi @qipeng, it seems your history for this PR includes some of dev's history (probably due to recent history rewriting). Can you remove these commits from your PR? The way I do this is an interactive rebase: git rebase -i dev (assuming you have a dev branch tracking BVLC's dev branch and it's up to date), which should pull up your text editor with all the commits. You want to delete the lines of commits that aren't yours (being careful not to delete any of your own commits), then save and quit. Finally, force push (git push -f) to this branch in your fork.

@qipeng
Copy link
Contributor Author

qipeng commented Jul 22, 2014

Hi @jeffdonahue , I was stupid to have merged the branch from a rebase. It should be fixed now. :)

@jeffdonahue
Copy link
Contributor

Great, thanks again. Merging now.

jeffdonahue added a commit that referenced this pull request Jul 22, 2014
@jeffdonahue jeffdonahue merged commit 51146b4 into BVLC:dev Jul 22, 2014
@shelhamer shelhamer mentioned this pull request Aug 7, 2014
mitmul pushed a commit to mitmul/caffe that referenced this pull request Sep 30, 2014
RazvanRanca pushed a commit to RazvanRanca/caffe that referenced this pull request Nov 4, 2014
@Yangqing
Copy link
Member

Yangqing commented Nov 8, 2014

(for the record: bug fixed at #1417 )

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants