Exponential Linear Units #3388

mohomran · 2015-11-26T01:08:35Z

Implementation of the Exponential Linear Units proposed in:

Clevert, D.-A., Unterthiner, T., & Hochreiter, S. (2015). Fast and Accurate Deep Network Learning by Exponential Linear Units (ELUs). http://arxiv.org/abs/1511.07289

I made one minor modification to the formula from the paper: f(x) = x, if x > 0 rather than if x >=0, with the corresponding change to the gradient. I did this for two reasons:

This way, when alpha = 0, an ELU reduces exactly to a ReLU as implemented in Caffe - basically: f'(0) = 0 instead of 1 as specified in the paper.
Also given the original formula, when alpha = 0 the loss would diverge during MNIST training. I would be happy to receive additional verification and to revise this change if necessary.

beniz · 2015-11-26T06:18:40Z

Great job! Was actually coming to check on ELU, found this :) Will report on performances when I can.

f0k · 2015-12-01T17:30:26Z

I made one minor modification to the formula from the paper: f(x) = x, if x > 0 rather than if x >=0, with the corresponding change to the gradient.

It seems this is actually what they did for the paper as well:
untom/binet@2c8a6bd
@untom, you might want to change the formula in the paper accordingly!

untom · 2015-12-01T23:59:23Z

Thanks for the head's up :)

Note that mathematically that as long as alpha == 1, this doesn't make a difference since exp(0) == 1, so both transfer function and gradient output the same thing regardless of > vs >= . Also due to the way ELUs look, it's pretty hard for an activation to hit 0 precisely, anyhow. But you're right, we used > 0 during our own experiments, both in the binet code as well as in our own caffe fork. If we make another paper revision we will definitely include that change.

shelhamer · 2015-12-02T06:09:54Z

Thanks for this @mohomran! That was quick.

I'm sorry that this was caught by the switch to layer headers in #3315 but could you update this to reflect the new arrangement? See the new ReLU header for an example.

mohomran · 2015-12-03T23:10:30Z

@beniz: Thanks. :) So far, I've only tested it on MNIST and CIFAR-10 ("quick"), but neither network is deep enough to result in significant gains according to the paper. The updated CIFAR-10 network seemed to converge a bit faster though.

@f0k, @untom: Thanks, good to know! As said, I encountered problems when alpha was set to 0, which prompted the change.

@shelhamer: Rebased and ready to go. :)

shelhamer · 2015-12-03T23:36:46Z

src/caffe/layers/elu_layer.cu

+  ELUForward<Dtype><<<CAFFE_GET_BLOCKS(count), CAFFE_CUDA_NUM_THREADS>>>(
+      count, bottom_data, top_data, alpha);
+  CUDA_POST_KERNEL_CHECK;
+  // << " count: " << count << " bottom_data: "


Drop commented code.

shelhamer · 2015-12-03T23:41:37Z

@jeffdonahue when Leaky ReLU was added it was incorporated into ReLU in #740. Do you have an opinion on a separate ELU layer?

jeffdonahue · 2015-12-04T05:28:51Z

I'd be fine with incorporating it into ReLU if there's a near 0 performance impact, but this feels more to me like it should be a separate layer than leaky ReLU (which felt like a more natural generalization to me, still being piecewise linear).

beniz · 2015-12-04T06:37:22Z

@mohomran so I've tested on GoogleNet, an even with BN activated, just for the sake of it. It appears to work fine, though the memory requirement appears to grow significantly, which translates into smaller batches. The typical memory error (or so I guess) happens on the CUDA_POST_KERNEL_CHECK in elu_layer.cu. FTR, I had cuDNN activated though of course ELU is not using it. I have some GPU time to kill over the next few days if some more experiments or reports can help.
EDIT: the memory bump is likely due to not using CuDNN vs using it for ReLU.

Following the discussion in [1] and the original implementation in [2]. In the original implementation > 0 was used not as reported in the paper. [1] BVLC/caffe#3388 [2] untom/binet@2c8a6bd

vchuravy · 2015-12-07T16:04:31Z

@untom It does make a difference for the gradient, for any a != 0

untom · 2015-12-16T09:58:36Z

Is there anything I can do to help move this PR forward?

Exponential Linear Units

shelhamer · 2016-01-22T18:50:33Z

Thanks for the non-linearity @mohomran and thanks for checking in regarding the paper details @untom!

mohomran changed the title ~~ELU layer with basic tests~~ Exponential Linear Units Nov 26, 2015

vchuravy mentioned this pull request Nov 26, 2015

[RFC] Adds ELU to LeakyReLU activation layer apache/mxnet#718

Merged

ronghanghu added the enhancement label Nov 26, 2015

beniz mentioned this pull request Nov 29, 2015

added support for ELU activation units with Caffe jolibrain/deepdetect#34

Merged

shelhamer added the needs rebase label Dec 2, 2015

mohomran force-pushed the exponential_linear_units branch from cf55322 to 03c9846 Compare December 3, 2015 23:03

shelhamer added ready for review and removed needs rebase labels Dec 3, 2015

shelhamer reviewed Dec 3, 2015
View reviewed changes

shelhamer added in progress and removed ready for review labels Dec 3, 2015

ELU layer with basic tests

a668194

mohomran force-pushed the exponential_linear_units branch from 03c9846 to a668194 Compare December 4, 2015 10:08

vchuravy mentioned this pull request Dec 7, 2015

ELU: change greater than equal to strict greater. apache/mxnet#854

Merged

shelhamer added ready for review ES and removed in progress labels Jan 22, 2016

shelhamer added a commit that referenced this pull request Jan 22, 2016

Merge pull request #3388 from mohomran/exponential_linear_units

a7ac8bc

Exponential Linear Units

shelhamer merged commit a7ac8bc into BVLC:master Jan 22, 2016

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Exponential Linear Units #3388

Exponential Linear Units #3388

mohomran commented Nov 26, 2015

beniz commented Nov 26, 2015

f0k commented Dec 1, 2015

untom commented Dec 1, 2015

shelhamer commented Dec 2, 2015

mohomran commented Dec 3, 2015

shelhamer Dec 3, 2015

shelhamer commented Dec 3, 2015

jeffdonahue commented Dec 4, 2015

beniz commented Dec 4, 2015

vchuravy commented Dec 7, 2015

untom commented Dec 16, 2015

shelhamer commented Jan 22, 2016

Exponential Linear Units #3388

Exponential Linear Units #3388

Conversation

mohomran commented Nov 26, 2015

beniz commented Nov 26, 2015

f0k commented Dec 1, 2015

untom commented Dec 1, 2015

shelhamer commented Dec 2, 2015

mohomran commented Dec 3, 2015

shelhamer Dec 3, 2015

Choose a reason for hiding this comment

shelhamer commented Dec 3, 2015

jeffdonahue commented Dec 4, 2015

beniz commented Dec 4, 2015

vchuravy commented Dec 7, 2015

untom commented Dec 16, 2015

shelhamer commented Jan 22, 2016