-
Notifications
You must be signed in to change notification settings - Fork 27.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Multi-Cond Guidance integration #1325
Comments
Should be possible to port the implementation to any stable-diffusion repository. Complexity mostly depends on whether you're trying to support "batch-of-n-samples" like the original CompVis code. My fork is complex but supports complexity like that, with a few fast-paths to make sure single-sample and single-prompt remain as fast as usual. Guidance (I only implemented for k-diffusion, but could easily be copy-pasted into ddpm samplers): Producing the embeddings from your text prompts: It wouldn't break existing prompts. If you're guiding on a single condition: the maths becomes the same equation as usual. |
I poked around a bit to see if I could find where it would slot in and it seems like the guidance bit needs to go in repositories/k-diffusion/sample.py And the embeddings bit needs to go in I was browsing the codebase on my phone so no way I was going to attempt to merge but it doesn't seem like too big a job. |
@saunderez Stuff in |
here's a bigger demonstration of what can be done with multi-cond guidance (i.e. animation): and yes, the hard part is more the wiring to change every function along the way to support "many" rather than "one" prompt or condition -- whereas the heavy lifting (the tensor gymnastics to be added to the CFG denoiser) can probably just be copy-pasted. how messy this gets depends on whether your repository already has a custom approach to prompting, token weighting, bespoke uncond or subprompting. |
I'm actually in the process of writing a custom morph script using this. Like you said, the actual denoising change is copy-pastable and since I'm only ever using two prompts at a time, I cheat and inject the target conditioning directly into the denoiser: # replacement forward function
def forward(self, x, sigma, uncond, cond, cond_scale):
if not hasattr(self, 'target_latent'):
self.target_latent = shared.sd_model.get_learned_conditioning([target_prompt])
cond = torch.cat([prompt_parser.reconstruct_cond_batch(cond, self.step), self.target_latent], dim=0)
# snip |
I think it's already solved a long time ago |
Is your feature request related to a problem? Please describe.
This feature request is related to problem of token limit.
Describe the solution you'd like
Scrolling through reddit, I found an interesting post about modding a classifier-free guidance. I don't really know if this can be applied to this fork, but idea of going beyond the token limit sounds great. But I guess it will change generation process and prompts will not produce the same results as before.
Post on Reddit
The text was updated successfully, but these errors were encountered: