Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implementation of Stable Diffusion with Aesthetic Gradients #2585

Merged
merged 21 commits into from
Oct 21, 2022

Conversation

MalumaDev
Copy link
Contributor

@TingTingin
Copy link

someone is working on this #2498 should probably review and see whats different

@ShadowPower
Copy link

File "D:\stable-diffusion-webui-aesthetic\modules\sd_hijack.py", line 411, in forward
    z = z * (1 - self.aesthetic_weight) + zn * self.aesthetic_weight
RuntimeError: The size of tensor a (154) must match the size of tensor b (77) at non-singleton dimension 1

It seems that the token length is limited by the CLIP model.

@EliEron
Copy link

EliEron commented Oct 14, 2022

This seems to work well, but the default values are a bit odd.

The repo recommends an aesthetic learning rate of 0.0001, but you default to 0.005 which is an order of magnitude higher. Is there a specific reason for this?

Similarly for aesthetic steps the repo recommends starting with relatively small step amounts, but the default in this PR is the highest value that the UI is set to allow.

@MalumaDev
Copy link
Contributor Author

To be quick, I put "random" default values 😅
I fixed the problem of the token length and I added the UI for the generation of the embedding. I need some hours of sleep, tomorrow I'll commit the code

Copy link
Contributor

@vicgalle vicgalle left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for adapting this, @MalumaDev! Looks good to me.
I only add some suggestions regarding the name of some parameters and the max value of one.

README.md Outdated Show resolved Hide resolved
modules/aesthetic_clip.py Outdated Show resolved Hide resolved
modules/ui.py Outdated Show resolved Hide resolved
modules/ui.py Outdated Show resolved Hide resolved
modules/ui.py Outdated Show resolved Hide resolved
modules/ui.py Outdated Show resolved Hide resolved
modules/aesthetic_clip.py Outdated Show resolved Hide resolved
MalumaDev and others added 6 commits October 15, 2022 18:39
Co-authored-by: Víctor Gallego <vicgalle@ucm.es>
Co-authored-by: Víctor Gallego <vicgalle@ucm.es>
Co-authored-by: Víctor Gallego <vicgalle@ucm.es>
Co-authored-by: Víctor Gallego <vicgalle@ucm.es>
Co-authored-by: Víctor Gallego <vicgalle@ucm.es>
Co-authored-by: Víctor Gallego <vicgalle@ucm.es>
@bmaltais
Copy link

bmaltais commented Oct 15, 2022

This feature is actually way more interesting than I thought. Pretty amazing the variations you can obtain using the images embeddings. I am still trying to figure out how to use all the different sliders and what they do... I really hope this will get merged someday.

I notices creating a new image embedding does not automatically get added to the pull down in text2img. Just a nit pick.

@bmaltais
Copy link

bmaltais commented Oct 15, 2022

Quick example for those wondering. I created an image embedding from a bunch of big eyes paintings and tried to apply it to the simple "a beautiful woman" seed 0 prompt. Here are the results:

Original prompt image:
image

Applying the image embedding style with aesthetic: learning rate 0.001, weight 0.85 and steps 40:
image

Increasing the weight to 1 increasing the style application resulting in something closer to the original paintings:
image

Bringing it down to 0.5 will obviously reduce the effect:
image

And the beauty is that it requires almost no computing time. This is next level stuff... Magic!!!

@bmaltais
Copy link

bmaltais commented Oct 15, 2022

Another example using the same prompt as above. I created an image embedding from a bunch of images at: https://lexica.art/?q=aadb4a24-2469-47d8-9497-cafc1f513071

After some fine tuning of the weights and learning rate I was able to get:
image

And from those https://lexica.art/?q=1f5ef1e0-9f3a-48b8-9062-d9120ba09274 I got:

image

And all this with literally no training what so ever. AMAZING!

@MalumaDev
Copy link
Contributor Author

This feature is actually way more interesting than I thought. Pretty amazing the variations you can obtain using the images embeddings. I am still trying to figure out how to use all the different sliders and what they do... I really hope this will get merged someday.

I notices creating a new image embedding does not automatically get added to the pull down in text2img. Just a nit pick.

Little bug. I'll fix it.

@bmaltais
Copy link

bmaltais commented Oct 15, 2022

I even tried feeding it 19 pictures of me in a non 1:1 aspect ratio (512x640) and gosh darn... if produced passable results!

Sample input image:

00000-0-a man with a beard and a white shirt is smiling at the camera with a waterfall in the background

Prompt with no Aesthetic applied:

image

Aesthetic applied:

image

Not as good as if I trained Dreambooth or TI but for a 1-minute fiddling it is amazing. It appears to apply the overall pose of some of the pictures I fed it. I wonder what would happen if I fed the thing with 100= photos of me in varying size... It is as if the size and ratio of images you feed it does not matter.

And what is amazing is that it does all this with a 4KB file!

@MalumaDev MalumaDev changed the title Implementation of Stable Diffusion with Aesthetic Gradients + Batch size and gradient accumulation for training Implementation of Stable Diffusion with Aesthetic Gradients ~~+ Batch size and gradient accumulation for training~~ Oct 15, 2022
@MalumaDev MalumaDev changed the title Implementation of Stable Diffusion with Aesthetic Gradients ~~+ Batch size and gradient accumulation for training~~ Implementation of Stable Diffusion with Aesthetic Gradients Oct 15, 2022
@feffy380
Copy link

feffy380 commented Oct 15, 2022

I'd suggest hiding the interface behind the Extra checkbox or at least moving it lower. It's quite large and pushes more commonly used options like CFG and Batch size/count off-screen.

@bmaltais
Copy link

I'd suggest hiding the interface behind the Extra checkbox or at least moving it lower. It's quite large and pushes more commonly used options like CFG and Batch size/count off-screen.

Indeed. I doubt Automatic will like it where it is now... best would be some sort of tabs inside the parameter section to present the current options in a default tab and access the aesthetic options in an aesthetic tab beside it.

@MalumaDev
Copy link
Contributor Author

An additional thing I'm going to ask of you is to isolate as much of your code into separate files as possible. The big chunk of code in sd_hijack should be in its own file. All the parameters of aesthetic gradients should be in members of your own class defined in your own file, not in sd_hijack.

WIP!!

@MalumaDev
Copy link
Contributor Author

MalumaDev commented Oct 16, 2022

On a separate note... do you think the same thing could be added to img2img to offer better conformity to the original image? I sometime feel the aesthetic model is difficult to control. A some point it totally change the original image instead of changing the overall style of it. If it was possible to control the weight of the aesthetic on top of the resulting prompt image without it without losing the whole look it would be even better.

Added

@bmaltais
Copy link

bmaltais commented Oct 16, 2022

I like the now expandable section for the aesthetic section. This is a step in the right direction and I hope Automatic will approve of it.

I tested the img2img implementation and it work very well. I was able to keep the general composition of the ofiginal and transform it toward the aesthetic without losing too much... NICE. Here is an example of applying the Big Eyes style to a man photo:

Original:

image

Styled with big eyes:

image

and the overall config:

image

Trying to apply the same aesthetic on the source text2img with same seed would result in this... which is not what I want:

image

I think the better workflow is:

  • Use text2img to get a good starting image (or just use an external image as a source)
  • send it to img2img
  • apply the aesthetic changes there and tweak to taste

@bmaltais
Copy link

Something else I noticed. Is there a reason the Aesthetic optimization is always computed? If no parameters for it have changed from generation to generation, could it not just be used from memory cache instead of always being recomputed?

@MalumaDev
Copy link
Contributor Author

Something else I noticed. Is there a reason the Aesthetic optimization is always computed? If no parameters for it have changed from generation to generation, could it not just be used from memory cache instead of always being recomputed?

When the seed changes so does the training result!!!

@feffy380
Copy link

feffy380 commented Oct 17, 2022

@bmaltais Looking at the original aesthetic gradients repo, the personalization step involves performing gradient descent to make the prompt embedding more similar to the aesthetic embedding. In other words, it has to be recomputed for each prompt. But it shouldn't be affected by the seed as far as I can tell. Actually, isn't the process nondeterministic regardless of seed unless you enable determinism in pytorch itself? Can someone test if running the same settings twice produces the same image?

@miaw24
Copy link

miaw24 commented Oct 20, 2022

I think there should be an option to do the Aesthetic optimization on cpu, before sending it back to the gpu for the image generation process. This might be useful for people with limited vram, so that they won't run out of vram when computing the Aesthetic optimization

@AUTOMATIC1111 AUTOMATIC1111 merged commit 7d6b388 into AUTOMATIC1111:master Oct 21, 2022
@bbecausereasonss
Copy link

Is there a tutorial on how to set this up/train it?

@bmaltais
Copy link

bmaltais commented Oct 21, 2022 via email

@rabidcopy
Copy link

So is there any hope to do this on 4GB of VRAM? My poor card has been able to handle everything(besides training) up to 576x576 so far with --medvram, VAEs, hypernetworks, upscalers, etc, but this puts me OOM after the first pass. 😅

@TinyBeeman
Copy link
Contributor

TinyBeeman commented Oct 22, 2022

It seems like "Aesthetic text for imgs" and slerp angle are somehow off... Values between 0.001 and 0.02 seem to cause the aesthetic text to influence the embedding in a meaningful way. But 0.2 to 1.0 seem random and not to have that much effect relative to each other. If I use "colorful painting", for instance (0.0 = ignore text, 0.001 = it adds color and flowers, 0.2 to 1.0 = the image seems to lose style altogther, and is neither colorful nor painterly.

@MalumaDev
Copy link
Contributor Author

MalumaDev commented Oct 22, 2022

The Dalle2 paper specifies that the max angle to use is in between [0.25,0.5]. (TextDiff)

@TinyBeeman
Copy link
Contributor

TinyBeeman commented Oct 22, 2022

@MalumaDev that makes sense, maybe we should adjust the slider range to be more helpful. That said, as currently implemented, ranges as low as .001 have interesting variations, and ranges above .25 seem to be… uninteresting. At least in my test cases.

@miaw24
Copy link

miaw24 commented Oct 22, 2022

@rabidcopy i am able to use it on 4GB of VRAM by editing aesthetic_clip.py where i changed every single device to 'cpu' (except the import part, of course), and then to prevent it from complaining that the tensors are in two different devices, editing the code in __call__ function of class AestheticCLIP, adding z = z.to('cpu') before if self.slerp: part, and also adding z = z.to(device) before the return z part. So far, this works (or at least it works for me), but idk whether using cpu to compute the aesthetic gradient will change the result if compared to the result produced while it is computed with cuda.

@cornpo
Copy link

cornpo commented Oct 23, 2022

I couldn't run the laion_7plus or sac_8plus since the original pr. Now tonight I can.

Gloom, Watercolor, et al work fine. Then on laion_7 or sac_8 IndexError: Dimension out of range (expected to be in range of [-1, 0], but got 1) but only with text in the aesthetic embeddings text box.

@hulululu7654321
Copy link

File "/home/hulululu/desktop/stable-diffusion-webui-master/extensions/aesthetic-gradients/aesthetic_clip.py", line 233, in call
sim = text_embs @ img_embs.T
RuntimeError: expected scalar type Float but found Half

how can i deal with this problem?

@baphilia
Copy link

baphilia commented Oct 28, 2022

The Dalle2 paper specifies that the max angle to use is in between [0.25,0.5]. (TextDiff)

It seems like "Aesthetic text for imgs" and slerp angle are somehow off... Values between 0.001 and 0.02 seem to cause the aesthetic text to influence the embedding in a meaningful way. But 0.2 to 1.0 seem random and not to have that much effect relative to each other. If I use "colorful painting", for instance (0.0 = ignore text, 0.001 = it adds color and flowers, 0.2 to 1.0 = the image seems to lose style altogther, and is neither colorful nor painterly.

anyone have a link or a quick explanation of what the 'aesthetic text for imgs', 'slerp angle', and 'slerp interpolation' are supposed to do? what should I be typing there? what is the desired effect? (I tried searching the paper and a few articles and readme's for the relevant terms, but I failed to find anything)

on low settings for angle it seems to be super random, just changing the entire subject of the image to something that has nothing to do with either the regular prompt or the aesthetic text, and at high settings it just seems to use the aesthetic text as a new prompt (without incorporating the styling of the embedding at all)

@jonwong666
Copy link

Aesthetic works best with TXT2IMG, its not for IMG2IMG

Im getting good results with these settings

image

the trick is to not use too many style or conflicting artists in the main prompt and let aesthetics do the work with a high learning rate.

@gsgoldma
Copy link

gsgoldma commented Nov 8, 2022

@rabidcopy i am able to use it on 4GB of VRAM by editing aesthetic_clip.py where i changed every single device to 'cpu' (except the import part, of course), and then to prevent it from complaining that the tensors are in two different devices, editing the code in __call__ function of class AestheticCLIP, adding z = z.to('cpu') before if self.slerp: part, and also adding z = z.to(device) before the return z part. So far, this works (or at least it works for me), but idk whether using cpu to compute the aesthetic gradient will change the result if compared to the result produced while it is computed with cuda.

I might be a fool, but which indentations did you use?

Copy link

@shamblessed shamblessed left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hydd

DrakeRichards pushed a commit to DrakeRichards/stable-diffusion-webui that referenced this pull request Dec 20, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.