-
Notifications
You must be signed in to change notification settings - Fork 4
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
How to handle parameter initialization #92
Comments
So, one solution is to add such functionality on RETURNN side. I'm not exactly sure how complicated this is. |
Another solution is to first calculate such example tensors and then put that directly into the config. But that would blow up the config a lot for big models, and would require different configs for different random seeds. |
Yet another solution would be to simply not allow custom init for now, and only allow what RETURNN currently allows. But this is probably quite un-intuitive from the viewpoint of returnn-common. |
When it depends on some other computation, should this allow also previous models? Also, params loaded from a checkpoint could in principle also be implemented using this. Should we unify this? |
Some more complex example: Let's say there is some model A, and it is used in some way to calculate initial params for model B. Pseudo code:
First of all, this requires more custom checkpoint logic (#93). But maybe you could argue such thing would be too complex, and this should anyway be done in a separate script. |
Another aspect: We should make use of stateless random ops (#95). |
We now have the generic We now have |
I think the newly introduced |
Despite the API for the definition of parameter initialization (#59), on the technical side, there also remain some question. Let's assume the generic case and some definition like this:
some_init_func
could be some arbitrary computation. This is another computation graph, which would effectively create other RETURNN layers.On RETURNN side, parameter initialization is only static, or allows some predefined TF initializers. RETURNN does not allow the param init to depend on other layers.
How do we solve this?
The text was updated successfully, but these errors were encountered: