Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Parameter random init outside name ctx does not work #109

Closed
albertz opened this issue Feb 11, 2022 · 3 comments · Fixed by #215
Closed

Parameter random init outside name ctx does not work #109

albertz opened this issue Feb 11, 2022 · 3 comments · Fixed by #215
Milestone

Comments

@albertz
Copy link
Member

albertz commented Feb 11, 2022

This happens e.g. for

model = nn.Linear(out_dim, in_dim=in_dim)

outside a NameCtx.

Or maybe we want to disallow this? I.e. simply require that the model (all params) are always defined inside a NameCtx.

Or lazily assign the parent name ctx and the name itself later, similar as we do for Parameter?

PR in #108.

@albertz
Copy link
Member Author

albertz commented Oct 5, 2022

We also need to think about deepcopy(module), which is used in various places, because we followed the PyTorch code style. For example, in TransformerEncoder, we have:

    self.layers = nn.ModuleList([copy.deepcopy(encoder_layer) for _ in range(num_layers)])

It doesn't make sense if we copy the same param init (the actual values), but only the init scheme.

We should be careful what deepcopy actually copies:

  • We already excluded nn.Dim objects, by making Dim.__deepcopy__ just return self.
  • I now ran into another bug, which was caused by copying the NameCtx, via the calls, up to the root name ctx. This does not really make sense. I think the module.calls should not be copied at all. It's also way too inefficient. I'm not really sure how to solve this. This is a general problem with deepcopy on a module.

As usual, we should check: How does PyTorch handle this? The param init, and where is the param init actually done, and what does deepcopy do in case of a PyTorch module? Any specific torch.Tensor objects would just be cloned but any of computations would not be repeated. So basically this means in our case, we should not copy nn.Tensor (as it is immutable), except for nn.Parameter. I.e. maybe also need a special __deepcopy__ logic for nn.Tensor.

@albertz
Copy link
Member Author

albertz commented Oct 5, 2022

About our own param init, also see: #59

About torch.nn.Linear, see the code. It looks like the initial value is directly assigned in Linear.__init__. But how does that work with deepcopy then? Wouldn't it have all the same parameters? Edit I just checked torch.nn.TransformerEncoder, and this indeed seem to have the problem. See pytorch/pytorch#86274.

@albertz
Copy link
Member Author

albertz commented Oct 5, 2022

We could maybe change the logic of the Parameter.initial setter, to not resolve (call) the ParamInit (VarianceScaling) directly in there, but later. But when exactly? This would solve the deepcopy problem w.r.t. param init.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
1 participant