Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature Request]: Latent Perturbation #4164

Open
1 task done
acid1103 opened this issue Nov 2, 2022 · 2 comments
Open
1 task done

[Feature Request]: Latent Perturbation #4164

acid1103 opened this issue Nov 2, 2022 · 2 comments
Labels
enhancement New feature or request

Comments

@acid1103
Copy link

acid1103 commented Nov 2, 2022

Is there an existing issue for this?

  • I have searched the existing issues and checked the recent builds/commits

What would your feature do ?

A video outlining the requested feature can be found here (timestamp included.)

This is somewhat related to these two issues:

In essence, latent perturbation would provide more fine-grained control of image variations than the current implementation of variations, and it would allow for recursive variations (at the expense of saving parameters in PNG metadata.) This is done by taking the initial latent given to the scheduler, perturbing it according to some scale factor, and running the perturbed latent through the scheduler. This produces a variation of the image generated by the original latent, where the difference between the images is determined by the scale factor.

Providing latent perturbation allows for something similar to a binary search through image space. Essentially, you could start with an image that's "good enough." Take its initial latent, generate a few perturbed variations at a relatively high scale factor, and run those. If you like one of the results, you can take that image's initial latent, perturb it by a smaller scale factor, and generate a new batch of slightly less varied images. Repeat this process until you've found the perfect image.

As mentioned, the downside of this is that embedding these steps in PNG metadata is unfeasible. Personally, this is a tradeoff I'm okay with, but I realize this might be a point of contention.

Proposed workflow

  1. Press the "Send to Variations" button under a generated image (This would take you to img2img -> Variations. Images wouldn't be able to be uploaded to this, due to the requirement for the image's initial latent.)
  2. Select your scale factor, sampling steps, sampling method, batch parameters, etc...
  3. Click generate
  4. Select a better variation of the input image
  5. Click "Send to Variations"
  6. Go to step 2

Additional information

Obviously this is a relatively big ask. I've looked through the code, and the way that initial latents are currently generated doesn't lend itself well to this. Not to mention the work required to do the workflow and UI changes. Personally, I would be more than happy with a relatively simple change that would allow me to achieve this functionality with a script. Regardless, this is one of the most useful ways I used stable diffusion prior to using the web ui. It'd be amazing to have both the web ui and latent perturbations.

@R-N
Copy link
Contributor

R-N commented Nov 3, 2022

That sounds great.

Might this be relevant? #4021

@acid1103
Copy link
Author

acid1103 commented Nov 4, 2022

@R-N After playing around with that and experimenting with setting the initial latent, it seems like only PLMS and DDIM work with this method. I know very very little about the inner working of schedulers and stable diffusion in general, so unfortunately I think this will have to be done by someone else. I doubt I have time to learn enough about these things to make the necessary changes or suggestions.

To answer your specific question, the whole idea of latent perturbation is that, by subtly perturbing the initial latent, you subtly perturb the resulting image. But using the CFG denoiser callback to set the initial latent to a known state still results in completely random final images. Identical initial latents should result in identical output images, but this doesn't happen with any of the methods which call the CFG denoiser callback. So a different approach will probably need to be taken.

@mezotaken mezotaken added the enhancement New feature or request label Jan 12, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

3 participants