Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CrossAttentionControl, Dreamfields-3D and Dreambooth implementation #1280

Open
Antoinevdlb opened this issue Sep 29, 2022 · 12 comments
Open
Labels
enhancement New feature or request

Comments

@Antoinevdlb
Copy link

Hey! I found these different Stable Diffusion library that I would love to see integrated on this repo.

Curious if anyone else would also enjoy to use them

Cross Attention Control by bloc97

Cross Attention Control allows much finer control of the prompt by modifying the internal attention maps of the diffusion model during inference without the need for the user to input a mask and does so with minimal performance penalities (compared to clip guidance) and no additional training or fine-tuning of the diffusion model.

DreamFields 3D by shengyu-meng

A toolkit to generate 3D mesh model / video / NeRF instance / multiview images of colourful 3D objects by text and image prompts input

DreamBooth

DreamBooth is a method to personalize text2image models like stable diffusion given just a few(3~5) images of a subject. The train_dreambooth.py script shows how to implement the training procedure and adapt it for stable diffusion.

All of these seem extremely useful, especially Dreambooth training.

Thanks for considering these!

@Goldenkoron
Copy link

I also very much want DreamBooth to become an integrated feature in the UI. Hopefully it can be made to work. I have an RTX 3090 so it should be possible to run.

@LLjo
Copy link

LLjo commented Sep 29, 2022

This would be awesome

@xbox002000
Copy link

super cool

@OWKenobi
Copy link
Contributor

+1 for DreamBooth!

@StrangeCalibur
Copy link

echo the +1 for dreambooth!

@AmericanPresidentJimmyCarter

I was able to get it working with about ~13 GB of RAM in half precision, but not matter what my checkpoints appear to be corrupt. :( Not sure why torch.save is producing things I can not deserialize.

@OWKenobi
Copy link
Contributor

OWKenobi commented Oct 4, 2022

Awesome! I have a graphics card that is capable of testing this, so if there is some kind of review process involved I could help in, please let me know.

I think it should be a separate TAB called Dreambooth - where you can select pictures from hard drive, select a destination for the final model, and then a button to process the images. Maybe a text field explaining the whole process, telling you the minimum requirements and the ETA. I think the UI would be fairly simple like that.

Also, we need an explanation of what the placeholder will be called. As far as I have read, prompts look like "a photo of a red [X]".

@d8ahazard
Copy link
Collaborator

Awesome! I have a graphics card that is capable of testing this, so if there is some kind of review process involved I could help in, please let me know.

I think it should be a separate TAB called Dreambooth - where you can select pictures from hard drive, select a destination for the final model, and then a button to process the images. Maybe a text field explaining the whole process, telling you the minimum requirements and the ETA. I think the UI would be fairly simple like that.

Also, we need an explanation of what the placeholder will be called. As far as I have read, prompts look like "a photo of a red [X]".

So, a few things I've learned thus far, although my results are still TBD.

One: You can totally run this on an GPU with only 8G of VRAM. At least, I can run it using the "optimized" repo, and then using the "accelerate config" option to tell it to run only in CPU. Is it slow as hell? Yes. Can I build a dreambooth model on my rig? Yes. :P

So, definitely need to figure out if we can somehow leverage "accelerate" for poor schlubs like me who haven't bought a new GPU in a few years.

For the placeholder/prompt, it's slightly different from Textural Inversion. With TI, we just use a prompt. "MrFluffy" for a dog or something. But with Dreambooth, it appears you also need a "class" parameter - so for MrFluffy, it would be "MrFluffy dog", and then the class prompt would be "dog". At least, I think that's the idea. Then to use it, you could do "a photo of MrFluffy dog in the mountains", or "a painting of MrFluffy dog", etc.

There's also a few unknowns I'm trying to experiment with...

How many pictures is ideal? I read somewhere today that a suggestion was 5 close-ups, 5 "half item" shots, and then 5-10 "full item" shots. Maybe more? Maybe less? Same goes with TI, actually.

There's also a "class image" parameter or something like that that conflicting sources say you can either fill via Text2Image, or by using a bunch of existing images of your class item, like dogs. Somewhere said ~100 or more is good for this...but IDK.

Another unknown is how creating the "new" checkpoint works with the script to convert from DreamBooth to SD. It relies on the existing SD ckpt file to build on - but the dreambooth training currently requires the diffuser files from the official SD checkpoint image. So, is there a way to extract the diffusers from a custom SD checkpoint, then use those? Or, even just extract them from the official checkpoint, as otherwise, my method requires cloning the official files from Huggingface, and that's never any fun. :P

Last - textural inversion saves an image every N steps, and a checkpoint every N steps. The version of DreamBooth I ported does not do this, but it would be super if it did, so I guess I need to research how that bit works.

But, at the very least, I've whipped up a python class that exposes the parameters needed to run this from within our little app. We just need the godz to decide how it should be implemented.

@dezigns333
Copy link

I would love to see DreamFields or DreamFields with Stable Diffusion or DreamFusion one day.

https://dreamfusion3d.github.io/

@dezigns333
Copy link

A proper version for an AMD GPU for stable diffusion webui would also be very popular.

@0xdevalias
Copy link

Dreambooth is available as an extension now, see below:

Closing this, as I've now started a repo with a standalone extension based on ShivShiram's repo here:

https://github.com/d8ahazard/sd_dreambooth_extension

Please feel free to test and yell at me there. I've added requirements installer, multiple concept training via JSON, and moved some bit about.

UI still needs fixing, some stuff broken there, but it should be able to train a model for now.

Originally posted by @d8ahazard in #3995 (comment)

@PGadoury
Copy link

PGadoury commented Jan 19, 2023

Seems like #2725 and #1825, which were closed as duplicate are focused solely on cross-attention-control, with specific suggestions toward implementation, rather than this issue which is somewhat conflating three requests and muddling the discussion. I understand not wanting to flood the board with duplicate issues, but since Dreambooth is implemented, wouldn't it make more sense to keep the two (three) enhancement tickets separate?

I.e, close this issue (since the discussion seems to be focused on dreambooth, for which an available extension already exists, i.e. https://github.com/d8ahazard/sd_dreambooth_extension), open a separate issue for Dreamfields (unless one already exists) and reopen #2275?

Of course, if the intention is to collect any and all enhancement tickets together so that possible developers can find them on a single ticket, then disregard this comment.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests