Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Multi-Cond Guidance integration #1325

Closed
DenkingOfficial opened this issue Sep 29, 2022 · 6 comments
Closed

Multi-Cond Guidance integration #1325

DenkingOfficial opened this issue Sep 29, 2022 · 6 comments

Comments

@DenkingOfficial
Copy link
Contributor

Is your feature request related to a problem? Please describe.
This feature request is related to problem of token limit.

Describe the solution you'd like
Scrolling through reddit, I found an interesting post about modding a classifier-free guidance. I don't really know if this can be applied to this fork, but idea of going beyond the token limit sounds great. But I guess it will change generation process and prompts will not produce the same results as before.

Post on Reddit

@Birch-san
Copy link

Should be possible to port the implementation to any stable-diffusion repository. Complexity mostly depends on whether you're trying to support "batch-of-n-samples" like the original CompVis code. My fork is complex but supports complexity like that, with a few fast-paths to make sure single-sample and single-prompt remain as fast as usual.

Guidance (I only implemented for k-diffusion, but could easily be copy-pasted into ddpm samplers):
https://github.com/Birch-san/stable-diffusion/blob/3548866e020ef0ddcb6b594984c2eb36d17341bd/scripts/txt2img_fork.py#L65

Producing the embeddings from your text prompts:
https://github.com/Birch-san/stable-diffusion/blob/3548866e020ef0ddcb6b594984c2eb36d17341bd/scripts/txt2img_fork.py#L844

It wouldn't break existing prompts. If you're guiding on a single condition: the maths becomes the same equation as usual.

@saunderez
Copy link

I poked around a bit to see if I could find where it would slot in and it seems like the guidance bit needs to go in

repositories/k-diffusion/sample.py

And the embeddings bit needs to go in
repositories/stable-diffusion/ scripts/txt2img.py

I was browsing the codebase on my phone so no way I was going to attempt to merge but it doesn't seem like too big a job.

@feffy380
Copy link

feffy380 commented Oct 3, 2022

@saunderez Stuff in repositories belongs to external dependencies. The relevant class here would be modules.sd_samplers.CFGDenoiser. The issue is there's layers on top of that which also need to be adjusted to handle multiple subprompts

@Birch-san
Copy link

here's a bigger demonstration of what can be done with multi-cond guidance (i.e. animation):
https://twitter.com/Birchlabs/status/1576693919667998722

and yes, the hard part is more the wiring to change every function along the way to support "many" rather than "one" prompt or condition -- whereas the heavy lifting (the tensor gymnastics to be added to the CFG denoiser) can probably just be copy-pasted. how messy this gets depends on whether your repository already has a custom approach to prompting, token weighting, bespoke uncond or subprompting.

@feffy380
Copy link

feffy380 commented Oct 3, 2022

I'm actually in the process of writing a custom morph script using this. Like you said, the actual denoising change is copy-pastable and since I'm only ever using two prompts at a time, I cheat and inject the target conditioning directly into the denoiser:

# replacement forward function
def forward(self, x, sigma, uncond, cond, cond_scale):
    if not hasattr(self, 'target_latent'):
        self.target_latent = shared.sd_model.get_learned_conditioning([target_prompt])
    cond = torch.cat([prompt_parser.reconstruct_cond_batch(cond, self.step), self.target_latent], dim=0)
    # snip

@DenkingOfficial
Copy link
Contributor Author

I think it's already solved a long time ago

nne998 pushed a commit to fjteam/stable-diffusion-webui that referenced this issue Sep 26, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants