Gluon's PReLU is very slow and a fix to it #10972

rocketbear · 2018-05-16T11:51:28Z

I have experienced significantly slow training speed when using PReLU activation instead of using ReLU activation with model composed using the gluon API. The speed of PReLU over ReLU is about 1/5 on GPU when measuring the number of samples processed per second.

Finally I managed to bring the speed of gluon's PReLU back to normal by the following modifications:
Following is the original init function of PReLU:

def __init__(self, alpha_initializer=initializer.Constant(0.25), **kwargs):
    super(PReLU, self).__init__(**kwargs)
    with self.name_scope():
        self.alpha = self.params.get('alpha', shape=(1,), init=alpha_initializer)

Following is the modified init function of PReLU:

def __init__(self, in_channels=1, alpha_initializer=initializer.Constant(0.25), **kwargs):
    super(PReLU, self).__init__(**kwargs)
    with self.name_scope():
        self.alpha = self.params.get('alpha', shape=(in_channels,), init=alpha_initializer)

The key is to pass in the expected number of channels to the PReLU block, so that it does not share the negative slope among the channels. The downside of this solution is that you need to pass in the number of channels every time.

I don't know why the two settings (shared vs. non-shared) have so drastically different performance. The contributors of mxnet should investigate this issue in further depth.

The text was updated successfully, but these errors were encountered:

szha · 2018-05-16T18:44:10Z

These two implementations are different in terms of the number of parameters. The performance hit likely comes from the broadcast operation.

chinakook · 2018-05-17T02:01:18Z

Yes, I investigated the leakyrelu-inl.h source code. There are indeed a broadcast operation when the shape of param 'alpha' is 1.

chinakook · 2018-05-17T04:36:14Z

I think it's no need to broadcast when multiply a scalar and a matrix. May some other operation is more suitable for this kind of multiplication.

szha added Performance Operator labels May 21, 2018

szha self-assigned this May 21, 2018

szha mentioned this issue May 21, 2018

leaky relu speed #11012

Merged

6 tasks

piiswrong closed this as completed in #11012 Jun 15, 2018

jonatanmil mentioned this issue Sep 25, 2019

[MXNET-1431] Multiple channel support in Gluon PReLU #16262

Merged

6 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Gluon's PReLU is very slow and a fix to it #10972

Gluon's PReLU is very slow and a fix to it #10972

rocketbear commented May 16, 2018

szha commented May 16, 2018

chinakook commented May 17, 2018

chinakook commented May 17, 2018

Gluon's PReLU is very slow and a fix to it #10972

Gluon's PReLU is very slow and a fix to it #10972

Comments

rocketbear commented May 16, 2018

szha commented May 16, 2018

chinakook commented May 17, 2018

chinakook commented May 17, 2018