How to handle parameter initialization #92

albertz · 2022-01-17T11:10:35Z

Despite the API for the definition of parameter initialization (#59), on the technical side, there also remain some question. Let's assume the generic case and some definition like this:

param = tf.Parameter(...)
param.initial = some_init_func()

some_init_func could be some arbitrary computation. This is another computation graph, which would effectively create other RETURNN layers.

On RETURNN side, parameter initialization is only static, or allows some predefined TF initializers. RETURNN does not allow the param init to depend on other layers.

How do we solve this?

The text was updated successfully, but these errors were encountered:

albertz · 2022-01-17T11:11:38Z

So, one solution is to add such functionality on RETURNN side. I'm not exactly sure how complicated this is.
Edit Actually I think this is not really so complicated. We might need a new option like init_by_layer for the VariableLayer. And then that's already almost it.
Edit This is done now, rwth-i6/returnn#911.

albertz · 2022-01-17T11:13:54Z

Another solution is to first calculate such example tensors and then put that directly into the config. But that would blow up the config a lot for big models, and would require different configs for different random seeds.

albertz · 2022-01-17T11:14:40Z

Yet another solution would be to simply not allow custom init for now, and only allow what RETURNN currently allows. But this is probably quite un-intuitive from the viewpoint of returnn-common.

albertz · 2022-01-17T14:41:18Z

When it depends on some other computation, should this allow also previous models?

Also, params loaded from a checkpoint could in principle also be implemented using this. Should we unify this?

albertz · 2022-01-18T08:39:49Z

Some more complex example: Let's say there is some model A, and it is used in some way to calculate initial params for model B. Pseudo code:

model_a = ModelA(...)
model_a.load_checkpoint(...)

model_b = ModelB(...)
model_b.param.initial = model_a.param * 2  # stupid example

First of all, this requires more custom checkpoint logic (#93).
Given that, would this work?

But maybe you could argue such thing would be too complex, and this should anyway be done in a separate script.
Although, then the question is, how would it look in the separate script.

albertz · 2022-01-19T22:52:43Z

Another aspect: We should make use of stateless random ops (#95).

albertz · 2022-02-01T13:47:26Z

We now have the generic RandomLayer (via rwth-i6/returnn#911) which supports stateless random ops.

We now have VariableLayer init_by_layer (via rwth-i6/returnn#911) to allow for an init via another layer (RandomLayer).

#92

albertz · 2022-02-01T14:35:38Z

I think the newly introduced Parameter.initial solves this for now.

#92

albertz mentioned this issue Jan 17, 2022

How to define the API for parameter initialization, regularization (L2, weight dropout, etc), maybe updater opts per-param #59

Closed

albertz added this to the first-release milestone Jan 17, 2022

This was referenced Jan 18, 2022

Model checkpoint load and store logic #93

Open

Missing pieces for first release #32

Open

Use stateless random ops #95

Closed

albertz mentioned this issue Jan 31, 2022

VariableLayer, init_by_layer option rwth-i6/returnn#911

Merged

albertz added a commit that referenced this issue Feb 1, 2022

Parameter.initial

dce4bd7

#92

albertz closed this as completed Feb 1, 2022

albertz added a commit that referenced this issue Feb 1, 2022

functions for param init

92a6324

#92

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How to handle parameter initialization #92

How to handle parameter initialization #92

albertz commented Jan 17, 2022

albertz commented Jan 17, 2022 •

edited

Loading

albertz commented Jan 17, 2022

albertz commented Jan 17, 2022

albertz commented Jan 17, 2022

albertz commented Jan 18, 2022 •

edited

Loading

albertz commented Jan 19, 2022

albertz commented Feb 1, 2022 •

edited

Loading

albertz commented Feb 1, 2022

How to handle parameter initialization #92

How to handle parameter initialization #92

Comments

albertz commented Jan 17, 2022

albertz commented Jan 17, 2022 • edited Loading

albertz commented Jan 17, 2022

albertz commented Jan 17, 2022

albertz commented Jan 17, 2022

albertz commented Jan 18, 2022 • edited Loading

albertz commented Jan 19, 2022

albertz commented Feb 1, 2022 • edited Loading

albertz commented Feb 1, 2022

albertz commented Jan 17, 2022 •

edited

Loading

albertz commented Jan 18, 2022 •

edited

Loading

albertz commented Feb 1, 2022 •

edited

Loading