Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[performance] module init w/ from_pretrained skip storage allocation #12274

Closed
stas00 opened this issue Jun 20, 2021 · 2 comments
Closed

[performance] module init w/ from_pretrained skip storage allocation #12274

stas00 opened this issue Jun 20, 2021 · 2 comments
Labels
WIP Label your PR/Issue with WIP for some long outstanding Issues/PRs that are work in progress

Comments

@stas00
Copy link
Contributor

stas00 commented Jun 20, 2021

🚀 Feature request

pt-1.9.0 added torch.nn.utils.skip_init() which (1) skips the module init (2) doesn't allocate any memory
https://pytorch.org/tutorials/prototype/skip_param_init.html

note: torch.nn.utils.skip_init() itself will be in 1.9.1, but the rest of the code should be in 1.9.0 (update: as 1.9.1 isn't planned, probably s/1.9.1/1.10/)

We already implemented part 1 (skipping the custom init) in #11471.

We could further speed up the start up time and reduce CPU memory usage, by not allocating any storage for module init since load_state_dict will already have allocated state_dict from the pretrained weights (and some sub-modules that don't have pre-trained weights - will have to go through normal init). See https://pytorch.org/tutorials/prototype/skip_param_init.html#implementation-details

another note: currently deepspeed needs to have the module storage pre-allocated for its zero.Init gather/scatter, but if the initial model's weights aren't allocated, then we can probably get rid of zero.Init altogether #12273

@stas00 stas00 added the WIP Label your PR/Issue with WIP for some long outstanding Issues/PRs that are work in progress label Jul 21, 2021
@huggingface huggingface deleted a comment from github-actions bot Jul 21, 2021
@patrickvonplaten patrickvonplaten mentioned this issue Aug 30, 2021
5 tasks
@tanaymeh
Copy link
Contributor

If this issue hasn't already been resolved and a fix is relevant, can I have a try at it @stas00?

@stas00
Copy link
Contributor Author

stas00 commented Oct 18, 2023

Thank you for offering to implement this, @tanaymeh

I think this is no longer relevant, as recent pytorch versions added allocation on meta device, which does the same and should be used instead, so closing this.

@stas00 stas00 closed this as completed Oct 18, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
WIP Label your PR/Issue with WIP for some long outstanding Issues/PRs that are work in progress
Projects
None yet
Development

No branches or pull requests

2 participants