-
Notifications
You must be signed in to change notification settings - Fork 130
Different ways to import parameters
There are multiple ways how parameters can be imported from some other model or checkpoint.
Note that you should differ between the use cases, whether this is for importing parameters for further training (e.g. like import_model_train_epoch1
), or for loading a model with given parameters for recognition only (e.g. load
in recognition).
Also be careful that some of the approaches might hide errors (such as typos) and would just ignore some parameters instead of throwing an error.
This is the most flexible option.
See the scripts tf_avg_checkpoints.py
or tf_inspect_checkpoint.py
as examples.
It should be straightforward to write some own custom logic.
This is probably also the safest option, as you should notice any errors.
However, this takes maybe also the most effort, so it might be overkill, esp for simple cases.
This is a dict name -> opts. The name is arbitrary, but the dict will be sorted by it to define the order.
(for _, opts in sorted(self.preload_from_files.items()):
.)
The opts is another dict which can contain:
-
init_for_train: bool = False
: If True, will use for train initialization (first epoch), and ignored in recognition. If False, will use for recognition, and ignored in training. Default is False. -
filename: str
: TF checkpoint path -
prefix: str = ""
: Prefix in the current model (e.g. subnetwork path like"model1/"
, or also layers prepared like"model1_..."
or so) ignore_missing: bool = False
ignore_params: list[str] = []
ignore_params_prefixes: list[str] = []
This is handled via CustomCheckpointLoader
.
The parameters must match exactly. In a new training (first epoch), instead of random initialization, it would load the given model checkpoint.
In training (task="train"
), if no existing model is found (model specified by model
config option), it would use load
(but for this case, you should use import_model_train_epoch1
instead to make it more explicit).
In non-training (task!="train"
, e.g. search, forwarding, eval etc), it would use load
.
If set, RETURNN will use CustomCheckpointLoader
, and that will call LayerBase.set_param_values_by_dict
, and if custom_param_importer
is a function, this will then call custom_param_importer(layer=self, values_dict=values_dict, session=session)
.
So you can define your own custom function to load the parameters in any way.
Note that this functionality is also used in pretraining when the model architecture changed.
The parameters of the previous epoch are then stored in Numpy arrays in values_dict
and then in the next epoch it will call LayerBase.set_param_values_by_dict
.
The param init uses get_initializer
and can in principle use any initializer (e.g. load_txt_file_initializer
), or even custom initializing code, which could import other parameters from a checkpoint or elsewhere.
Note that this is used always, both training and recognition, and also when the network is reinitialized (e.g. due to pretraining).
Thus this is probably only useful for recognition with the current logic.
The config option preload_from_files
is maybe a better and more flexible way.
This is intended to reuse params from other layers, i.e. to share params.
But it can be used to overwrite a custom get_variable
function which can again do arbitrary things, like using a custom name in any custom name scope, setting a custom initializers, or defining a variable as a fixed constant, or whatever.
Example:
"layer": {
...,
"reuse_params": {"map": {"W": {"custom": my_custom_variable_creater}}}
}
This is handled by ReuseParams
.
It will call the function like custom_func(base_layer=base_layer, reuse_layer=self.reuse_layer, name=param_name, getter=getter, full_name=name, **kwargs)
where kwargs
are other args passed to tf.compat.v1.variable_scope
.
(This is currently not implemented but planned.)
Via returnn-common, layers can define their own custom name scope, and thus allowing to match some other model checkpoint format.