Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Intergrate fp16/bf16 support to sdxl model loading #791

Merged
merged 2 commits into from
Sep 3, 2023

Conversation

Isotr0py
Copy link
Contributor

@Isotr0py Isotr0py commented Aug 27, 2023

Maybe related issue: #788

  • Specify fp16/bf16 dtype when loading model state_dict
  • Directly load unet and vae to fp16/bf16 dtype if use full fp16/bf16.
    -text_encoders are remained to fp32 for cache outputs.

This should reduce RAM/VRAM usage peak when enable --full_fp16/--full_bf16 during model loading on CPU/GPU.

It also reduced RAM usage when loading checkpoint from safetensors format.

@kohya-ss
Copy link
Owner

kohya-ss commented Sep 2, 2023

Thank you for this! This is very useful. I wonder it is possible to load Text Encoders as fp32, even with full_fp16/bf16 option. If it is not possible, this feature may be enabled when lowram/lowvram option is specified.

@Isotr0py
Copy link
Contributor Author

Isotr0py commented Sep 2, 2023

Yes, dtype is not passed to _load_state_dict_on_device for text_encoders. So text_encoders will always load as fp32, no matter whether full_fp16/bf16 is enabled or not.

@kohya-ss
Copy link
Owner

kohya-ss commented Sep 2, 2023

Thank you for clarification, and sorry for misunderstood. I understood the text encoders are loaded in fp16/bf16 only if the Diffusers format is used.

@kohya-ss kohya-ss merged commit f6d417e into kohya-ss:dev Sep 3, 2023
1 check passed
@kohya-ss
Copy link
Owner

kohya-ss commented Sep 3, 2023

I've merged. This significantly reduces the peak memory usage. Thanks again!

@Isotr0py Isotr0py deleted the dev branch September 5, 2023 05:17
@FurkanGozukara
Copy link

could this have introduced a major bug?

My SDXL LoRAs are super trained now with same settings compared to before this change

Actually I am testing right now effect of this option

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants