-
Notifications
You must be signed in to change notification settings - Fork 2.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[feat] Reduce peak VRAM memory usage of IP adapter #6453
[feat] Reduce peak VRAM memory usage of IP adapter #6453
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Careful, I'm going to start leaving more TODO
comments if they magically get addressed! 😉 Thanks for looking into this.
This wasn't quite what I had in mind, but I'm glad that it enables you to run more workflows. As a future improvement, it would be nice to reduce the coupling between the core IP-Adapter model, the IP-Adapter image projection model, and the CLIP Vision model. Then we can run the CLIP Vision model in it's own node, and won't have to lock->unlock->relock the IP-Adapter model like we are doing now.
I left a few minor comments. Once those are addressed, this looks good to me.
I ran a quick smoke test - no smoke.
…m:invoke-ai/InvokeAI into lstein/optimization/ip-image-encoder-vram
This is one place where |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just one minor comment on the latest changes.
Co-authored-by: Ryan Dick <ryanjdick3@gmail.com>
@RyanJDick All comments are addressed. You want to give this a quick once-over? |
Summary
On my 12 GB VRAM GPU I was unable to simultaneously apply both an IP Adapter and an OpenPose ControlNet module to an SDXL model without running out of VRAM memory. Digging into a bit, I found this remark in
latent.py
As suggested by @RyanJDick, I put some effort into this and moved the encoding step to occur the main model execution context, thereby reducing VRAM requirements. This solved the out of memory issue!
Related Issues / Discussions
There are a number of mypy-detected typecheck errors in
latents.py
that precede this PR. I have not tracked these down.I’d like to move the whole
prep_ip_adapter_data()
call outside the model loader context (rather than just the code that generates the image prompt embeds), but this will take some more effort.QA Instructions
Run with various combinations of IP Adapters and controlnets, and compare peak VRAM usage before and after applying this PR. Check for stability.
Merge Plan
Merge when approved.
Checklist