Leaky ReLU #740

qipeng · 2014-07-19T20:25:28Z

Implemented the Leaky ReLU unit described in this paper

Maas, Andrew L., Awni Y. Hannun, and Andrew Y. Ng. "Rectifier nonlinearities improve neural network acoustic models." ICML Workshop on Deep Learning for Audio, Speech, and Language Processing. 2013.

which shares similar sparse activation properties with the ReLU, but was shown easier to optimize.

jeffdonahue · 2014-07-19T22:26:34Z

src/caffe/test/test_neuron_layer.cpp

+  typedef typename TypeParam::Dtype Dtype;
+  LayerParameter layer_param;
+  ReLULayer<Dtype> layer(layer_param);
+  layer_param.ParseFromString("relu_param{negative_slope:0.01}");


I think the value of the relu_param isn't actually being used here because you define it after the layer is constructed (in the line before, 84).

jeffdonahue · 2014-07-19T22:43:49Z

Looks good, thanks @qipeng! See my nitpicky comments though.

I just realized this can also be used as an absolute value neuron, with negative_slope = -1.

qipeng · 2014-07-20T00:49:18Z

@jeffdonahue Many thanks for the helpful comments!

I've made changes accordingly and fixed the bug in the unit test code.

jeffdonahue · 2014-07-21T18:12:23Z

Since I was the one that made @qipeng go back and merge this into the ReLU layer, I wanted to do some benchmarking before merging this, to make sure the cost to architectures using the existing ReLU layer wasn't too high.

I ran net_speed_benchmark for 50 iters on the GPU with imagenet_train.prototxt. I wrote a quick python script to parse out just the ReLU timings and add them up:

[/home/jdonahue/dev 4181]$ ./compare_benchmark.py
/usr/bin/python: /home/jdonahue/anaconda/lib/libcrypto.so.1.0.0: no version information available (required by /usr/bin/python)
/usr/bin/python: /home/jdonahue/anaconda/lib/libssl.so.1.0.0: no version information available (required by /usr/bin/python)
relu1_backward:: 260.350000 (old ReLU) 260.097000 (new LReLU)
relu1_forward:: 208.607000 (old ReLU) 210.056000 (new LReLU)
relu2_backward:: 167.445000 (old ReLU) 167.241000 (new LReLU)
relu2_forward:: 134.078000 (old ReLU) 135.129000 (new LReLU)
relu3_backward:: 58.336600 (old ReLU) 58.307600 (new LReLU)
relu3_forward:: 46.687700 (old ReLU) 47.014100 (new LReLU)
relu4_backward:: 58.402200 (old ReLU) 58.353100 (new LReLU)
relu4_forward:: 46.765100 (old ReLU) 47.048400 (new LReLU)
relu5_backward:: 38.929900 (old ReLU) 38.881000 (new LReLU)
relu5_forward:: 31.230400 (old ReLU) 31.426800 (new LReLU)
relu6_backward:: 3.834750 (old ReLU) 3.866850 (new LReLU)
relu6_forward:: 3.055140 (old ReLU) 3.089760 (new LReLU)
relu7_backward:: 3.839100 (old ReLU) 3.877890 (new LReLU)
relu7_forward:: 3.053630 (old ReLU) 3.092320 (new LReLU)
Old time: 1064.61452
New time: 1067.48082

(All above times are in milliseconds.) So by those numbers this has incurs a performance hit of about 0.27% for the ReLU layer. The total benchmark run time for the full ImageNet architecture was around 76530ms, so the performance hit for the full ImageNet architecture is about 0.004%. This is a pretty small cost -- do we care? (If so, I guess we'd have to re-split this into a separate LReLU layer...sorry for the inconvenience @qipeng; I can redo the splitting for you if you'd like and this turns out to be our decision.)

qipeng · 2014-07-21T19:45:07Z

@jeffdonahue Thanks for the tests and insightful comments!

~~Just to check, are you using the default setting for the new LReLU (which sets negative_slope to zero) to behave like a ReLU? Also, are these experiments conducted with -O2?~~

~~If the answers are yes to both of them, then -O2 is weaker than I previously thought (it didn't get rid of the multiplication and addition by zero.~~

Actually I'm not too familiar with how caffe works: does it do any type of just-in-time (JIT) compilation? I.e. does the compilation happen before or after it knew negative_slope was zero? If it's not JIT then this performance should be totally acceptable because we are at least talking about an additional multiplication (and one addition). If it is JIT, then -O2 is weaker than I thought and if others in the community feel like this is too much of a issue I'll split them again. (but seriously 0.27% means only 4 extra minutes for every day in the process...)

jeffdonahue · 2014-07-21T19:55:16Z

There is no JIT compilation, it's all a single thread so as you say it's not surprising that the extra multiplication and addition incurs some cost. The impact on extra time per day is a good way to look at it, but I think the percentage I'd actually look at is the total cost in ImageNet training, which is an even more negligible 0.004% or ~3 seconds per day. I agree this is quite negligible and will merge this once it's rebased unless another Caffe dev says otherwise.

@qipeng, please rebase this once more and comment when done. If it passes Travis after rebase I'll merge immediately so you won't have to worry about it anymore (and if anything else gets merged before that causes more conflicts, I'll redo the rebase myself). Thanks again!

qipeng · 2014-07-21T22:44:56Z

Hi @jeffdonahue , I've just done rebasing and Travis CI seems passing. Let me know if any last-minute changes are needed! :)

jeffdonahue · 2014-07-21T22:58:26Z

Hi @qipeng, it seems your history for this PR includes some of dev's history (probably due to recent history rewriting). Can you remove these commits from your PR? The way I do this is an interactive rebase: git rebase -i dev (assuming you have a dev branch tracking BVLC's dev branch and it's up to date), which should pull up your text editor with all the commits. You want to delete the lines of commits that aren't yours (being careful not to delete any of your own commits), then save and quit. Finally, force push (git push -f) to this branch in your fork.

qipeng · 2014-07-22T04:38:11Z

Hi @jeffdonahue , I was stupid to have merged the branch from a rebase. It should be fixed now. :)

jeffdonahue · 2014-07-22T04:51:12Z

Great, thanks again. Merging now.

Leaky ReLU

Yangqing · 2014-11-08T01:33:57Z

(for the record: bug fixed at #1417 )

qipeng mentioned this pull request Jul 19, 2014

Lrelu layer #717

Closed

jeffdonahue reviewed Jul 19, 2014
View reviewed changes

qipeng added 2 commits July 21, 2014 15:32

leaky relu + unit test

0193012

reduced multiplications & fixed unit test

ba3ea2b

jeffdonahue added a commit that referenced this pull request Jul 22, 2014

Merge pull request #740 from qipeng/lrelu

51146b4

Leaky ReLU

jeffdonahue merged commit 51146b4 into BVLC:dev Jul 22, 2014

qipeng mentioned this pull request Jul 23, 2014

Leaky ReLU numerical stability improvement #773

Merged

shelhamer mentioned this pull request Aug 7, 2014

Next: 0.9999 #880

Merged

mitmul pushed a commit to mitmul/caffe that referenced this pull request Sep 30, 2014

Merge pull request BVLC#740 from qipeng/lrelu

bfeacd3

Leaky ReLU

RazvanRanca pushed a commit to RazvanRanca/caffe that referenced this pull request Nov 4, 2014

Merge pull request BVLC#740 from qipeng/lrelu

dc8bcf3

Leaky ReLU

shelhamer mentioned this pull request Dec 3, 2015

Exponential Linear Units #3388

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Leaky ReLU #740

Leaky ReLU #740

qipeng commented Jul 19, 2014

jeffdonahue Jul 19, 2014

jeffdonahue commented Jul 19, 2014

qipeng commented Jul 20, 2014

jeffdonahue commented Jul 21, 2014

qipeng commented Jul 21, 2014

jeffdonahue commented Jul 21, 2014

qipeng commented Jul 21, 2014

jeffdonahue commented Jul 21, 2014

qipeng commented Jul 22, 2014

jeffdonahue commented Jul 22, 2014

Yangqing commented Nov 8, 2014

Leaky ReLU #740

Leaky ReLU #740

Conversation

qipeng commented Jul 19, 2014

jeffdonahue Jul 19, 2014

Choose a reason for hiding this comment

jeffdonahue commented Jul 19, 2014

qipeng commented Jul 20, 2014

jeffdonahue commented Jul 21, 2014

qipeng commented Jul 21, 2014

jeffdonahue commented Jul 21, 2014

qipeng commented Jul 21, 2014

jeffdonahue commented Jul 21, 2014

qipeng commented Jul 22, 2014

jeffdonahue commented Jul 22, 2014

Yangqing commented Nov 8, 2014