Parameter random init outside name ctx does not work #109

albertz · 2022-02-11T19:51:33Z

This happens e.g. for

model = nn.Linear(out_dim, in_dim=in_dim)

outside a NameCtx.

Or maybe we want to disallow this? I.e. simply require that the model (all params) are always defined inside a NameCtx.

Or lazily assign the parent name ctx and the name itself later, similar as we do for Parameter?

PR in #108.

The text was updated successfully, but these errors were encountered:

albertz · 2022-10-05T14:55:24Z

We also need to think about deepcopy(module), which is used in various places, because we followed the PyTorch code style. For example, in TransformerEncoder, we have:

    self.layers = nn.ModuleList([copy.deepcopy(encoder_layer) for _ in range(num_layers)])

It doesn't make sense if we copy the same param init (the actual values), but only the init scheme.

We should be careful what deepcopy actually copies:

We already excluded nn.Dim objects, by making Dim.__deepcopy__ just return self.
I now ran into another bug, which was caused by copying the NameCtx, via the calls, up to the root name ctx. This does not really make sense. I think the module.calls should not be copied at all. It's also way too inefficient. I'm not really sure how to solve this. This is a general problem with deepcopy on a module.

As usual, we should check: How does PyTorch handle this? The param init, and where is the param init actually done, and what does deepcopy do in case of a PyTorch module? Any specific torch.Tensor objects would just be cloned but any of computations would not be repeated. So basically this means in our case, we should not copy nn.Tensor (as it is immutable), except for nn.Parameter. I.e. maybe also need a special __deepcopy__ logic for nn.Tensor.

albertz · 2022-10-05T15:02:54Z

About our own param init, also see: #59

About torch.nn.Linear, see the code. It looks like the initial value is directly assigned in Linear.__init__. But how does that work with deepcopy then? Wouldn't it have all the same parameters? Edit I just checked torch.nn.TransformerEncoder, and this indeed seem to have the problem. See pytorch/pytorch#86274.

albertz · 2022-10-05T15:47:10Z

We could maybe change the logic of the Parameter.initial setter, to not resolve (call) the ParamInit (VarianceScaling) directly in there, but later. But when exactly? This would solve the deepcopy problem w.r.t. param init.

albertz mentioned this issue Feb 11, 2022

Fix random Parameter init outside name ctx #108

Closed

albertz added this to the first-release milestone Feb 11, 2022

albertz mentioned this issue Feb 11, 2022

Missing pieces for first release #32

Open

albertz mentioned this issue Oct 5, 2022

Lazy init causes unexpected behavior? #212

Closed

albertz mentioned this issue Oct 5, 2022

remove lazy init logic, Linear, Conv, Norm API changes #215

Merged

albertz closed this as completed in #215 Oct 6, 2022

albertz mentioned this issue Oct 6, 2022

Conformer/Transformer has same initial param value in each layer #216

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Parameter random init outside name ctx does not work #109

Parameter random init outside name ctx does not work #109

albertz commented Feb 11, 2022 •

edited

Loading

albertz commented Oct 5, 2022

albertz commented Oct 5, 2022 •

edited

Loading

albertz commented Oct 5, 2022

Parameter random init outside name ctx does not work #109

Parameter random init outside name ctx does not work #109

Comments

albertz commented Feb 11, 2022 • edited Loading

albertz commented Oct 5, 2022

albertz commented Oct 5, 2022 • edited Loading

albertz commented Oct 5, 2022

albertz commented Feb 11, 2022 •

edited

Loading

albertz commented Oct 5, 2022 •

edited

Loading