Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to handle parameter initialization #92

Closed
albertz opened this issue Jan 17, 2022 · 8 comments
Closed

How to handle parameter initialization #92

albertz opened this issue Jan 17, 2022 · 8 comments
Milestone

Comments

@albertz
Copy link
Member

albertz commented Jan 17, 2022

Despite the API for the definition of parameter initialization (#59), on the technical side, there also remain some question. Let's assume the generic case and some definition like this:

param = tf.Parameter(...)
param.initial = some_init_func()

some_init_func could be some arbitrary computation. This is another computation graph, which would effectively create other RETURNN layers.

On RETURNN side, parameter initialization is only static, or allows some predefined TF initializers. RETURNN does not allow the param init to depend on other layers.

How do we solve this?

@albertz
Copy link
Member Author

albertz commented Jan 17, 2022

So, one solution is to add such functionality on RETURNN side. I'm not exactly sure how complicated this is.
Edit Actually I think this is not really so complicated. We might need a new option like init_by_layer for the VariableLayer. And then that's already almost it.
Edit This is done now, rwth-i6/returnn#911.

@albertz
Copy link
Member Author

albertz commented Jan 17, 2022

Another solution is to first calculate such example tensors and then put that directly into the config. But that would blow up the config a lot for big models, and would require different configs for different random seeds.

@albertz
Copy link
Member Author

albertz commented Jan 17, 2022

Yet another solution would be to simply not allow custom init for now, and only allow what RETURNN currently allows. But this is probably quite un-intuitive from the viewpoint of returnn-common.

@albertz
Copy link
Member Author

albertz commented Jan 17, 2022

When it depends on some other computation, should this allow also previous models?

Also, params loaded from a checkpoint could in principle also be implemented using this. Should we unify this?

@albertz albertz added this to the first-release milestone Jan 17, 2022
@albertz
Copy link
Member Author

albertz commented Jan 18, 2022

Some more complex example: Let's say there is some model A, and it is used in some way to calculate initial params for model B. Pseudo code:

model_a = ModelA(...)
model_a.load_checkpoint(...)

model_b = ModelB(...)
model_b.param.initial = model_a.param * 2  # stupid example

First of all, this requires more custom checkpoint logic (#93).
Given that, would this work?

But maybe you could argue such thing would be too complex, and this should anyway be done in a separate script.
Although, then the question is, how would it look in the separate script.

@albertz
Copy link
Member Author

albertz commented Jan 19, 2022

Another aspect: We should make use of stateless random ops (#95).

@albertz
Copy link
Member Author

albertz commented Feb 1, 2022

We now have the generic RandomLayer (via rwth-i6/returnn#911) which supports stateless random ops.

We now have VariableLayer init_by_layer (via rwth-i6/returnn#911) to allow for an init via another layer (RandomLayer).

albertz added a commit that referenced this issue Feb 1, 2022
@albertz
Copy link
Member Author

albertz commented Feb 1, 2022

I think the newly introduced Parameter.initial solves this for now.

@albertz albertz closed this as completed Feb 1, 2022
albertz added a commit that referenced this issue Feb 1, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant