What param init defaults should we use #94

albertz · 2022-01-18T22:00:03Z

Some options:

Follow whatever RETURNN does currently
Follow TensorFlow / Keras
Follow PyTorch. Although the current defaults are not optimal and there is ongoing discussion to change them, how to change then, and what the new defaults should be. See Update weight initialisations to current best practices pytorch/pytorch#18182 and Update weight init to general use cases pytorch/pytorch#41638.
Figure out the best current practice. Although this is maybe not so clear.

I collected some common options here: https://github.com/rwth-i6/returnn/wiki/Parameter-initialization

albertz · 2022-02-01T16:54:44Z

From other posts I read so far, it is claimed that Kaiming He (scale=2., mode="fan_in", distribution="normal") is currently the best practice.

However, I have not really seen that being used much in public setups. TF/Keras and other Google code often uses Glorot uniform. And this is also the default for the RETURNN (TensorFlow) LinearLayer.

albertz · 2022-02-02T13:26:56Z

From some Twitter discussion (via @lucasb-eyer):

Kaiming for CNNs with ReLUs, Xavier as a starting point for everything else.

I guess Xavier Glorot (scale=1.0, mode="fan_avg", distribution="uniform") is a good default choice for any tensors >= 2 dimensions, and just 0. for tensors <= 1 dimension.

albertz · 2022-02-02T13:28:40Z

Should our LSTM overwrite this?

Fix #94

albertz added this to the first-release milestone Jan 18, 2022

This was referenced Jan 18, 2022

Missing pieces for first release #32

Open

How to define the API for parameter initialization, regularization (L2, weight dropout, etc), maybe updater opts per-param #59

Closed

This was referenced Feb 4, 2022

Intermediate usage before first release #98

Closed

Param init #103

Merged

albertz closed this as completed in #103 Feb 7, 2022

albertz added a commit that referenced this issue Feb 7, 2022

Param init (#103)

ccb7622

Fix #94

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

What param init defaults should we use #94

What param init defaults should we use #94

albertz commented Jan 18, 2022 •

edited

Loading

albertz commented Feb 1, 2022

albertz commented Feb 2, 2022

albertz commented Feb 2, 2022

What param init defaults should we use #94

What param init defaults should we use #94

Comments

albertz commented Jan 18, 2022 • edited Loading

albertz commented Feb 1, 2022

albertz commented Feb 2, 2022

albertz commented Feb 2, 2022

albertz commented Jan 18, 2022 •

edited

Loading