CrossAttentionControl, Dreamfields-3D and Dreambooth implementation #1280

Antoinevdlb · 2022-09-29T01:13:17Z

Hey! I found these different Stable Diffusion library that I would love to see integrated on this repo.

Curious if anyone else would also enjoy to use them

Cross Attention Control allows much finer control of the prompt by modifying the internal attention maps of the diffusion model during inference without the need for the user to input a mask and does so with minimal performance penalities (compared to clip guidance) and no additional training or fine-tuning of the diffusion model.

DreamFields 3D by shengyu-meng

A toolkit to generate 3D mesh model / video / NeRF instance / multiview images of colourful 3D objects by text and image prompts input

DreamBooth

DreamBooth is a method to personalize text2image models like stable diffusion given just a few(3~5) images of a subject. The train_dreambooth.py script shows how to implement the training procedure and adapt it for stable diffusion.

All of these seem extremely useful, especially Dreambooth training.

Thanks for considering these!

The text was updated successfully, but these errors were encountered:

Goldenkoron · 2022-09-29T02:03:56Z

I also very much want DreamBooth to become an integrated feature in the UI. Hopefully it can be made to work. I have an RTX 3090 so it should be possible to run.

LLjo · 2022-09-29T22:38:45Z

This would be awesome

xbox002000 · 2022-09-30T06:58:08Z

super cool

OWKenobi · 2022-09-30T16:51:23Z

+1 for DreamBooth!

StrangeCalibur · 2022-09-30T21:20:56Z

echo the +1 for dreambooth!

AmericanPresidentJimmyCarter · 2022-10-02T21:51:24Z

I was able to get it working with about ~13 GB of RAM in half precision, but not matter what my checkpoints appear to be corrupt. :( Not sure why torch.save is producing things I can not deserialize.

OWKenobi · 2022-10-04T20:14:48Z

Awesome! I have a graphics card that is capable of testing this, so if there is some kind of review process involved I could help in, please let me know.

I think it should be a separate TAB called Dreambooth - where you can select pictures from hard drive, select a destination for the final model, and then a button to process the images. Maybe a text field explaining the whole process, telling you the minimum requirements and the ETA. I think the UI would be fairly simple like that.

Also, we need an explanation of what the placeholder will be called. As far as I have read, prompts look like "a photo of a red [X]".

d8ahazard · 2022-10-04T20:46:46Z

Awesome! I have a graphics card that is capable of testing this, so if there is some kind of review process involved I could help in, please let me know.

I think it should be a separate TAB called Dreambooth - where you can select pictures from hard drive, select a destination for the final model, and then a button to process the images. Maybe a text field explaining the whole process, telling you the minimum requirements and the ETA. I think the UI would be fairly simple like that.

Also, we need an explanation of what the placeholder will be called. As far as I have read, prompts look like "a photo of a red [X]".

So, a few things I've learned thus far, although my results are still TBD.

One: You can totally run this on an GPU with only 8G of VRAM. At least, I can run it using the "optimized" repo, and then using the "accelerate config" option to tell it to run only in CPU. Is it slow as hell? Yes. Can I build a dreambooth model on my rig? Yes. :P

So, definitely need to figure out if we can somehow leverage "accelerate" for poor schlubs like me who haven't bought a new GPU in a few years.

For the placeholder/prompt, it's slightly different from Textural Inversion. With TI, we just use a prompt. "MrFluffy" for a dog or something. But with Dreambooth, it appears you also need a "class" parameter - so for MrFluffy, it would be "MrFluffy dog", and then the class prompt would be "dog". At least, I think that's the idea. Then to use it, you could do "a photo of MrFluffy dog in the mountains", or "a painting of MrFluffy dog", etc.

There's also a few unknowns I'm trying to experiment with...

How many pictures is ideal? I read somewhere today that a suggestion was 5 close-ups, 5 "half item" shots, and then 5-10 "full item" shots. Maybe more? Maybe less? Same goes with TI, actually.

There's also a "class image" parameter or something like that that conflicting sources say you can either fill via Text2Image, or by using a bunch of existing images of your class item, like dogs. Somewhere said ~100 or more is good for this...but IDK.

Another unknown is how creating the "new" checkpoint works with the script to convert from DreamBooth to SD. It relies on the existing SD ckpt file to build on - but the dreambooth training currently requires the diffuser files from the official SD checkpoint image. So, is there a way to extract the diffusers from a custom SD checkpoint, then use those? Or, even just extract them from the official checkpoint, as otherwise, my method requires cloning the official files from Huggingface, and that's never any fun. :P

Last - textural inversion saves an image every N steps, and a checkpoint every N steps. The version of DreamBooth I ported does not do this, but it would be super if it did, so I guess I need to research how that bit works.

But, at the very least, I've whipped up a python class that exposes the parameters needed to run this from within our little app. We just need the godz to decide how it should be implemented.

dezigns333 · 2022-10-06T06:11:26Z

I would love to see DreamFields or DreamFields with Stable Diffusion or DreamFusion one day.

https://dreamfusion3d.github.io/

dezigns333 · 2022-10-06T06:13:37Z

A proper version for an AMD GPU for stable diffusion webui would also be very popular.

0xdevalias · 2022-11-09T06:39:03Z

Dreambooth is available as an extension now, see below:

Closing this, as I've now started a repo with a standalone extension based on ShivShiram's repo here:

https://github.com/d8ahazard/sd_dreambooth_extension

Please feel free to test and yell at me there. I've added requirements installer, multiple concept training via JSON, and moved some bit about.

UI still needs fixing, some stuff broken there, but it should be able to train a model for now.

Originally posted by @d8ahazard in #3995 (comment)

PGadoury · 2023-01-19T11:25:45Z

Seems like #2725 and #1825, which were closed as duplicate are focused solely on cross-attention-control, with specific suggestions toward implementation, rather than this issue which is somewhat conflating three requests and muddling the discussion. I understand not wanting to flood the board with duplicate issues, but since Dreambooth is implemented, wouldn't it make more sense to keep the two (three) enhancement tickets separate?

I.e, close this issue (since the discussion seems to be focused on dreambooth, for which an available extension already exists, i.e. https://github.com/d8ahazard/sd_dreambooth_extension), open a separate issue for Dreamfields (unless one already exists) and reopen #2275?

Of course, if the intention is to collect any and all enhancement tickets together so that possible developers can find them on a single ticket, then disregard this comment.

ClashSAN mentioned this issue Oct 11, 2022

Dreambooth Tab #2302

Closed

mezotaken added the enhancement New feature or request label Jan 12, 2023

This was referenced Jan 13, 2023

Prompt-to-Prompt Image Editing with Cross Attention Contro #1825

Closed

CrossAttentionControl #2310

Closed

Prompt-to-Prompt Image Editing with Cross Attention Control #2725

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

CrossAttentionControl, Dreamfields-3D and Dreambooth implementation #1280

CrossAttentionControl, Dreamfields-3D and Dreambooth implementation #1280

Antoinevdlb commented Sep 29, 2022

Goldenkoron commented Sep 29, 2022

LLjo commented Sep 29, 2022

xbox002000 commented Sep 30, 2022

OWKenobi commented Sep 30, 2022

StrangeCalibur commented Sep 30, 2022

AmericanPresidentJimmyCarter commented Oct 2, 2022

OWKenobi commented Oct 4, 2022

d8ahazard commented Oct 4, 2022

dezigns333 commented Oct 6, 2022

dezigns333 commented Oct 6, 2022

0xdevalias commented Nov 9, 2022

PGadoury commented Jan 19, 2023 •

edited

Loading

CrossAttentionControl, Dreamfields-3D and Dreambooth implementation #1280

CrossAttentionControl, Dreamfields-3D and Dreambooth implementation #1280

Comments

Antoinevdlb commented Sep 29, 2022

Goldenkoron commented Sep 29, 2022

LLjo commented Sep 29, 2022

xbox002000 commented Sep 30, 2022

OWKenobi commented Sep 30, 2022

StrangeCalibur commented Sep 30, 2022

AmericanPresidentJimmyCarter commented Oct 2, 2022

OWKenobi commented Oct 4, 2022

d8ahazard commented Oct 4, 2022

dezigns333 commented Oct 6, 2022

dezigns333 commented Oct 6, 2022

0xdevalias commented Nov 9, 2022

PGadoury commented Jan 19, 2023 • edited Loading

PGadoury commented Jan 19, 2023 •

edited

Loading