-
Notifications
You must be signed in to change notification settings - Fork 27.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Dreambooth: Ready to go! #3995
Dreambooth: Ready to go! #3995
Conversation
It definitely works.
@AUTOMATIC1111 - Please give this a look and merge. |
Have you considered adding the prompt template file for this? There are some reports that this increases quality, with forks like https://github.com/victorchall/EveryDream-trainer |
With this last version I get a javascript error when I click "Train", and nothing happens:
|
Awesome job on this :) Would you like me to rework d8ahazard#2 into conversion.py? |
Clean install: EDIT: I had to manually install accelerate using pip. |
Sadly it still seems like it's still out of reach for 12GB VRAM users on Windows. So close: and CPU option basically spits out: Can't play any more with it tonight, I have work early in the morning. |
is there any way to test this before it's been merged? |
@coltography You can create a new folder and open it in command shell and use the one of the commands below to install it to that new folder. |
If you follow @MartinCairnsSQL, make sure to |
Not entirely? If you run under WSL2 and properly configure deepspeed, 8bit-adam, and accelerate; then skip the "webui.sh" file and run with accelerate launch launch.py, it should be runnable on GPU for 12GB. Might need to disable training of the text encoder under "advanced" tho. |
Unfortunately still can't use this, running on Win11/WSL with 12gb 3080ti Starting Dreambooth training... Loaded model. Exception importing 8bit adam: 'NoneType' object has no attribute 'cuDeviceGetCount' ***** Running training ***** Steps: 0%| | 0/1000 [00:00<?, ?it/s] First unet step completed. Caught exception. Exception training db: CUDA out of memory. Tried to allocate 20.00 MiB (GPU 0; 12.00 GiB total capacity; 11.13 GiB already allocated; 0 bytes free; 11.23 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF CLEANUP: Cleanup Complete. Steps: 0%| | 0/1000 [00:05<?, ?it/s] Training completed, reloading SD Model. Memory output: {'VRAM cleared.': '0.0/0.0GB', 'Loaded model.': '0.0/0.0GB', 'Scheduler Loaded': '0.0/0.0GB', 'CPU: False Adam: False, Prec: fp16, Prior: False, Grad: True, TextTr: True ': '3.8/3.9GB', ' First unet step completed.': '3.8/3.9GB', 'Caught exception.': '11.1/11.2GB', 'CLEANUP: ': '11.1/11.2GB', 'Cleanup Complete.': '11.0/11.2GB', 'Training completed, reloading SD Model.': '0.0/7.8GB'} |
CPU Adam is erroring.
…On Sun, Oct 30, 2022, 4:17 PM Alfredo Fernandes ***@***.***> wrote:
Unfortunately still can't use this, running on Win11/WSL with 12gb 3080ti
Starting Dreambooth training...
VRAM cleared.
Allocated: 0.0GB
Reserved: 0.0GB
Loaded model.
Allocated: 0.0GB
Reserved: 0.0GB
Exception importing 8bit adam: 'NoneType' object has no attribute
'cuDeviceGetCount'
Scheduler Loaded
Allocated: 0.0GB
Reserved: 0.0GB
***** Running training *****
Num examples = 12
Num batches each epoch = 12
Num Epochs = 84
Instantaneous batch size per device = 1
Total train batch size (w. parallel, distributed & accumulation) = 1
Gradient Accumulation steps = 1
Total optimization steps = 1000
Total target lifetime optimization steps = 1000
CPU: False Adam: False, Prec: fp16, Prior: False, Grad: True, TextTr: True
Allocated: 3.8GB
Reserved: 3.9GB
Steps: 0%| | 0/1000 [00:00<?, ?it/s] First unet step completed.
Allocated: 3.8GB
Reserved: 3.9GB
Caught exception.
Allocated: 11.1GB
Reserved: 11.2GB
Exception training db: CUDA out of memory. Tried to allocate 20.00 MiB
(GPU 0; 12.00 GiB total capacity; 11.13 GiB already allocated; 0 bytes
free; 11.23 GiB reserved in total by PyTorch) If reserved memory is >>
allocated memory try setting max_split_size_mb to avoid fragmentation. See
documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF
Traceback (most recent call last):
File "/home/fred/stable-diffusion-webui/modules/dreambooth/dreambooth.py",
line 491, in train
optimizer.step()
File
"/home/fred/anaconda3/envs/diffusers/lib/python3.9/site-packages/accelerate/optimizer.py",
line 134, in step
self.scaler.step(self.optimizer, closure)
File
"/home/fred/anaconda3/envs/diffusers/lib/python3.9/site-packages/torch/cuda/amp/grad_scaler.py",
line 341, in step
retval = self._maybe_opt_step(optimizer, optimizer_state, *args, **kwargs)
File
"/home/fred/anaconda3/envs/diffusers/lib/python3.9/site-packages/torch/cuda/amp/grad_scaler.py",
line 288, in _maybe_opt_step
retval = optimizer.step(*args, **kwargs)
File
"/home/fred/anaconda3/envs/diffusers/lib/python3.9/site-packages/torch/optim/lr_scheduler.py",
line 68, in wrapper
return wrapped(*args, **kwargs)
File
"/home/fred/anaconda3/envs/diffusers/lib/python3.9/site-packages/torch/optim/optimizer.py",
line 140, in wrapper
out = func(*args, **kwargs)
File
"/home/fred/anaconda3/envs/diffusers/lib/python3.9/site-packages/torch/autograd/grad_mode.py",
line 27, in decorate_context
return func(*args, **kwargs)
File
"/home/fred/anaconda3/envs/diffusers/lib/python3.9/site-packages/torch/optim/adamw.py",
line 147, in step
state['exp_avg'] = torch.zeros_like(p, memory_format=torch.preserve_format)
torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 20.00
MiB (GPU 0; 12.00 GiB total capacity; 11.13 GiB already allocated; 0 bytes
free; 11.23 GiB reserved in total by PyTorch) If reserved memory is >>
allocated memory try setting max_split_size_mb to avoid fragmentation. See
documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF
CLEANUP:
Allocated: 11.1GB
Reserved: 11.2GB
Cleanup Complete.
Allocated: 11.0GB
Reserved: 11.2GB
Steps: 0%| | 0/1000 [00:05<?, ?it/s] Training completed, reloading SD
Model.
Allocated: 0.0GB
Reserved: 7.8GB
Memory output: {'VRAM cleared.': '0.0/0.0GB', 'Loaded model.':
'0.0/0.0GB', 'Scheduler Loaded': '0.0/0.0GB', 'CPU: False Adam: False,
Prec: fp16, Prior: False, Grad: True, TextTr: True ': '3.8/3.9GB', ' First
unet step completed.': '3.8/3.9GB', 'Caught exception.': '11.1/11.2GB',
'CLEANUP: ': '11.1/11.2GB', 'Cleanup Complete.': '11.0/11.2GB', 'Training
completed, reloading SD Model.': '0.0/7.8GB'}
Re-applying optimizations...
Applying cross attention optimization (Doggettx).
Returning result: Training finished. Total lifetime steps: 0
—
Reply to this email directly, view it on GitHub
<#3995 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAMO4NF36JMDREDIHG4TGATWF3QWTANCNFSM6AAAAAARSMEDCU>
.
You are receiving this because you authored the thread.Message ID:
***@***.***>
|
fred you have xformers enabled right? |
@d8ahazard any idea how to fix it? or where I should look at? @ArcticFaded uh no xformers, I get bunch of errors when I enable the argument on webui, but I'm using shivam repo for dreambooth without issues that way |
create a model from "create model' tab 😃 |
It looks like the preprocess images area does not work for me, however everything else (so far) seems to work fine. When trying any function in preprocess, this error is thrown.
|
bitsandbytes also needs to be installed manually but even when it's installed it throws "Exception importing 8bit adam: too many values to unpack (expected 5)" As for WSL2, I haven't the faintest idea what i'm doing with that. I feel that the 10GB VRAM requirement that you stated isn't going to be possible without jumping through a lot of hoops which the average end user isn't going to know what to do. I mean CPU training works at least but there is still some work to get GPU training working without loads of messing around. Sorry to be that 'guy' but i'm not as tech savvy as I once was. EDIT: I should also state that I can use xformers but even that didn't stop the CUDA OOM. |
I think this is not ready for directly use yet. There are too many uncontrollable factors. Unless bitsandbytes can locally support on windows, it will be possible to run DB locally without having to mess with WSL or something. |
How does one install accelerate? It's saying Edit: added "accelerate" to "requirements_versions.txt" made it go away. However when training i get |
You can follow my instructions on how to install xformers and Adam8bit support directly in windows... https://github.com/bmaltais/kohya_ss It is another dreambooth solution that work even with 8GB VRAM on windows as it does some special things to limit memory usage. |
Aside from needing to update "Preprocess" due to the UI being updated in the 20+ days this has been an existing PR, what is not ready for use yet? If you're running on a slow GPU, you need to install some extra requirements, which are out of scope of this project. If you're running on a decent GPU, it should just work. |
Training ends after 1 iteration with 3090, what might be the issue? And also how do i use pre-generated classification images, do i just state the path and number of images? Thank you for the great work btw. |
Excerpt from my blog:
If you really want to run on an 8G or 12G graphics card, it is very unrealistic to use BitSandBytes only, and the peak memory usage can reach 15GB at one point. If this is absolutely necessary, DeepSpeed can be used, but this will take up an additional 25G of memory and increase training time by a factor of nine times (V100 SXM2 16G 7min/ksteps->1.1Hour/ksteps). This is very unwise, and 8G video memory users are strongly recommended to use the free Colab training. By the way: DeepSpeed developed by Microsoft is very poorly supported on Windows and barely works. |
I can use Shivam's repo with 3060 12gb if I enable accelerate and 8-bit adam, I can even do 'train_text_encoder' if I enable gradient_checkpointing. I don't see this working ever in this WebUI though, since just launching it reserves ~2.8gb VRAM, which is VRAM that Dreambooth will need to work on a 12gb VRAM system, unless you enable deepspeed, which I've never tried (is there a steep performance penalty ?) Perhaps you can have a separate launcher for Dreambooth which omits whatever is loaded into vram by default. |
|
try installing dependencies pip install ninja bitsandbytes I didn’t manage to run deepspeed in windows, if you don’t have a 24 GB GPU (usually it takes 15.3-19), then for now it remains only to wait for optimizations. Or turn your attention to other projects designed for 12 GB GPU for training |
Thx, you save me wasting time then, yeah i'm using this (with a 1080 TI with WSL2 + Ubuntu): https://github.com/ShivamShrirao/diffusers/tree/main/examples/dreambooth And also the GUI version out there. |
look in your requirements.txt file, accelerate should be the first one. try installing it separately |
theres no accelerate in that txt |
are you using the pull request or the extension that was posted here, but to activate the SD environment, go to your stable dif's main directory open terminal/cmd and run \venv\Scripts\activate.bat - I believe it is, then when you run pip install accelerate it will install into the environment that SD pulls from which is the same environment that the extension uses. Then when you have finished installing the modules use deactivate and it should deactivate you from being inside the env. |
i know it's a stupid modification, but it works, i'm new to this stuff. i modified main.py from dreambooth, it automatically installs accelerate, for me it works with just that |
https://github.com/d8ahazard/stable-diffusion-webui/tree/DreamBooth_V2 |
Ok lets try his i tryed every single dreambooth offline methods around |
no space on device?! :))) |
yeah i realized that and redid from beginning on another drive |
How did you move things to another drive? That looks like you didn't clone the repo using git, or that you didn't move the hidden .git folder along with the rest of it if you copied the files somewhere. |
I’ve done some testing and unfortunately the nmkd app output just sucks compared to other methods. You really do need prior loss preservation to get good results on people.
…-Ryan McGinnis
***@***.***
[https://bigstormpicture.com](http://bigstormpicture.com)
GPG: 5C73 8727 EE58 786A 777C 4F1D B5AA 3FA3 486E D7AD
On Sun, Nov 6, 2022 at 13:34, 4lt3r3go ***@***.***> wrote:
>>> look in your requirements.txt file, accelerate should be the first one. try installing it separately
>>
>> theres no accelerate in that txt and i have no idea how to install it properly. I have zero knowledge of python/pythorch/pystuff lmao i guess i'll just sit here with popcorn and wait you nerd guys come out with something easy to install and use for newbies like me.
>
> https://github.com/d8ahazard/stable-diffusion-webui/tree/DreamBooth_V2 Download as zip, unpack and start. Work if you gpu 20GB+ or CPU train
Ok lets try his
nope, nothing is working. I just give up
errors:
[image](https://user-images.githubusercontent.com/44883288/200191145-b84800f5-f5cf-4f37-a0f0-e71468e7770b.png)
i tryed every single dreambooth offline methods around
the only working for me (but missing some features) is nmkd app.
—
Reply to this email directly, [view it on GitHub](#3995 (comment)), or [unsubscribe](https://github.com/notifications/unsubscribe-auth/AMNT5O4RYSST5IP7AFSWJ33WHAB33ANCNFSM6AAAAAARSMEDCU).
You are receiving this because you commented.Message ID: ***@***.***>
|
prior loss preservation? i dont see this thing in this repo here (this discussion went too far and too many stuff/repos mentioned finally managed to make it work, |
Closing this, as I've now started a repo with a standalone extension based on ShivShiram's repo here: https://github.com/d8ahazard/sd_dreambooth_extension Please feel free to test and yell at me there. I've added requirements installer, multiple concept training via JSON, and moved some bit about. UI still needs fixing, some stuff broken there, but it should be able to train a model for now. |
So, we don't need anymore a 24gb of VRAM then? |
Theoretically, no? |
Prior preservation loss is something where you add a large number of images in the same class as the thing you were trying to train. So for training a person, you would generate a couple thousand images in the person class, or use one of the existing data sets that people have already generated. the dream booth training will then reference those images to try to contain the changes made to the model to the class of things you are training against and not bleed over much into other things. There are some other really handy tweaks that can be done as well, such as learning rate and doing text training, that seems to have a big impact. Long story short, using the vanilla plain Jane text-based encoder, you can achieve really good results. But most of the GUI tools that I have used or the google collab or the runpod scripts I have seen kinda suck in comparison
…-Ryan McGinnis
***@***.***
[https://bigstormpicture.com](http://bigstormpicture.com)
GPG: 5C73 8727 EE58 786A 777C 4F1D B5AA 3FA3 486E D7AD
On Sun, Nov 6, 2022 at 16:22, 4lt3r3go ***@***.***> wrote:
> I’ve done some testing and unfortunately the nmkd app output just sucks compared to other methods. You really do need prior loss preservation to get good results on people.
> […](#)
prior loss preservation? i dont see this thing in this repo here
(this discussion went too far and too many stuff/repos mentioned
to clarify to who is reading this for the first time, i tryed this repo and now i have dreambooth in the ui
https://github.com/d8ahazard/stable-diffusion-webui/tree/DreamBooth_V2)
so I finally managed to make it work, doing from beginning again .. now i need to understand how to use it 🥲
this is way more complex than NMKD
[image](https://user-images.githubusercontent.com/44883288/200198082-74d74e38-fec3-424c-b957-bc8b61877b42.png)
—
Reply to this email directly, [view it on GitHub](#3995 (comment)), or [unsubscribe](https://github.com/notifications/unsubscribe-auth/AMNT5O6SPCYSAYQHVZOZW4LWHAVQXANCNFSM6AAAAAARSMEDCU).
You are receiving this because you commented.Message ID: ***@***.***>
|
Should be... https://github.com/ShivamShrirao/diffusers/tree/main/examples/dreambooth |
Thanks, dreambooth helped me learn. |
Okay, this is the third submission of this.
Everything should work now, but may require some horsepower to do so. It can theoretically work on 10GB GPU, possibly 8 if the user sets up WSL2 and xformers and other stuff, but that will be left outside the scope of this project.
Point is, I've tested this quite thoroughly now, and everything does what it should do in a pretty efficient manner.
You can currently:
Fine-tune any existing checkpoint. Load it from the UI, configure settings, go.
Reload any existing set of training data from the UI, including prompts, directories, everything.
Train with or without "prior preservation loss".
Optionally train the text encoder as well, which promises better results with human subjects, same with PPL.
Auto-convert diffuser data back to checkpoint data.
Future things to implement (once initial is merged):
Multiple subjects at once.
Auto reload SD checkpoint list.
Add a "cooldown" option where you can pause after N steps to give your GPU a break, then resume again after N seconds/minutes.
Final submission, replaces #2002