Replies: 1 comment 1 reply
-
we have a WIP here #9453 |
Beta Was this translation helpful? Give feedback.
1 reply
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
I am trying to speed up inference on an image generation pipeline that will swap in many differenet loras.
Is it possible to compile the base Flux model and then load a lora into it, create an image, and unload the lora without needing to recompile the model every time?
In this example, it seems like it has to recompile every time I add in a Lora. Each lora will be used exactly once, so I'd like to take advantage of the speed improvements to the base model.
Each lora will be discarded and theres no guarantee that the sizes of the loras will be the same.
Does fusing help here? or is there a way to tell pytorch to reuse the compiled information? I didnt think that a LoRA would change the compilation learnings.
Beta Was this translation helpful? Give feedback.
All reactions