[performance] module init w/ `from_pretrained` skip storage allocation #12274

stas00 · 2021-06-20T15:52:49Z

🚀 Feature request

pt-1.9.0 added torch.nn.utils.skip_init() which (1) skips the module init (2) doesn't allocate any memory
https://pytorch.org/tutorials/prototype/skip_param_init.html

note: torch.nn.utils.skip_init() itself will be in 1.9.1, but the rest of the code should be in 1.9.0 (update: as 1.9.1 isn't planned, probably s/1.9.1/1.10/)

We already implemented part 1 (skipping the custom init) in #11471.

We could further speed up the start up time and reduce CPU memory usage, by not allocating any storage for module init since load_state_dict will already have allocated state_dict from the pretrained weights (and some sub-modules that don't have pre-trained weights - will have to go through normal init). See https://pytorch.org/tutorials/prototype/skip_param_init.html#implementation-details

another note: currently deepspeed needs to have the module storage pre-allocated for its zero.Init gather/scatter, but if the initial model's weights aren't allocated, then we can probably get rid of zero.Init altogether #12273

The text was updated successfully, but these errors were encountered:

tanaymeh · 2023-10-18T10:41:31Z

If this issue hasn't already been resolved and a fix is relevant, can I have a try at it @stas00?

stas00 · 2023-10-18T15:00:38Z

Thank you for offering to implement this, @tanaymeh

I think this is no longer relevant, as recent pytorch versions added allocation on meta device, which does the same and should be used instead, so closing this.

stas00 mentioned this issue Jun 20, 2021

[Performance] Tracking open Issues and PRs (pytorch transformers) #12126

Open

12 tasks

stas00 mentioned this issue Jul 15, 2021

Running out of memory when resume training. #12680

Closed

stas00 added the WIP Label your PR/Issue with WIP for some long outstanding Issues/PRs that are work in progress label Jul 21, 2021

huggingface deleted a comment from github-actions bot Jul 21, 2021

patrickvonplaten mentioned this issue Aug 30, 2021

GPT-J-6B #13022

Merged

5 tasks

stas00 closed this as completed Oct 18, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[performance] module init w/ `from_pretrained` skip storage allocation #12274

[performance] module init w/ `from_pretrained` skip storage allocation #12274

stas00 commented Jun 20, 2021 •

edited

Loading

tanaymeh commented Oct 18, 2023

stas00 commented Oct 18, 2023

[performance] module init w/ from_pretrained skip storage allocation #12274

[performance] module init w/ from_pretrained skip storage allocation #12274

Comments

stas00 commented Jun 20, 2021 • edited Loading

🚀 Feature request

tanaymeh commented Oct 18, 2023

stas00 commented Oct 18, 2023

[performance] module init w/ `from_pretrained` skip storage allocation #12274

[performance] module init w/ `from_pretrained` skip storage allocation #12274

stas00 commented Jun 20, 2021 •

edited

Loading