Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

deepspeed.runtime.zero.utils.ZeRORuntimeException #489

Closed
Johnreidsilver opened this issue Mar 31, 2023 · 1 comment
Closed

deepspeed.runtime.zero.utils.ZeRORuntimeException #489

Johnreidsilver opened this issue Mar 31, 2023 · 1 comment

Comments

@Johnreidsilver
Copy link

Hi, thanks for the effort on this program to run fine-tunning on lower VRAM cards.

I'm looking for help to make it run on my laptop with a 3060 Max-Q 6GB VRAM, I just installed it and perhaps I misconfigured something as it crashes just as nvtop shows VRAM at 25% and GPU shooting up, so it's not lack of VRAM

`
Replace CrossAttention.forward to use xformers
[Dataset 0]
caching latents.
100%|████████████████████████████████████████████████████████████████████████████████████████| 3/3 [00:01<00:00, 2.60it/s]
import network module: networks.lora
create LoRA network. base dim (rank): 8, alpha: 1.0
create LoRA for Text Encoder: 72 modules.
create LoRA for U-Net: 192 modules.
enable LoRA for text encoder
enable LoRA for U-Net
prepare optimizer, data loader etc.

===================================BUG REPORT===================================
Welcome to bitsandbytes. For bug reports, please submit your error trace to: https://github.com/TimDettmers/bitsandbytes/issues
For effortless bug reporting copy-paste your error into this form: https://docs.google.com/forms/d/e/1FAIpQLScPB8emS3Thkp66nvqwmjTEgxp8Y9ufuWTzFyr9kJ5AoI47dQ/viewform?usp=sf_link

/home/userdir/git/kohya_ss/venv/lib/python3.10/site-packages/bitsandbytes/cuda_setup/paths.py:27: UserWarning: WARNING: The following directories listed in your path were found to be non-existent: {PosixPath('/home/userdir/git/kohya_ss/venv/lib/python3.10/site-packages/cv2/../../lib64')}
warn(
CUDA SETUP: CUDA runtime path found: /usr/local/cuda-11.7/targets/x86_64-linux/lib/libcudart.so
CUDA SETUP: Highest compute capability among GPUs detected: 8.6
CUDA SETUP: Detected CUDA version 117
CUDA SETUP: Loading binary /home/userdir/git/kohya_ss/venv/lib/python3.10/site-packages/bitsandbytes/libbitsandbytes_cuda117.so...
use 8-bit AdamW optimizer | {}
[2023-03-31 11:32:01,014] [INFO] [logging.py:93:log_dist] [Rank 0] DeepSpeed info: version=0.8.3, git-hash=unknown, git-branch=unknown
[2023-03-31 11:32:01,180] [INFO] [logging.py:93:log_dist] [Rank 0] DeepSpeed Flops Profiler Enabled: False
[2023-03-31 11:32:01,181] [INFO] [logging.py:93:log_dist] [Rank 0] Removing param_group that has no 'params' in the client Optimizer
[2023-03-31 11:32:01,182] [INFO] [logging.py:93:log_dist] [Rank 0] Using client Optimizer as basic optimizer
Traceback (most recent call last):
File "/home/userdir/git/kohya_ss/train_network.py", line 711, in
train(args)
File "/home/userdir/git/kohya_ss/train_network.py", line 224, in train
unet, text_encoder, network, optimizer, train_dataloader, lr_scheduler = accelerator.prepare(
File "/home/userdir/git/kohya_ss/venv/lib/python3.10/site-packages/accelerate/accelerator.py", line 872, in prepare
result = self._prepare_deepspeed(*args)
File "/home/userdir/git/kohya_ss/venv/lib/python3.10/site-packages/accelerate/accelerator.py", line 1093, in _prepare_deepspeed
engine, optimizer, _, lr_scheduler = deepspeed.initialize(**kwargs)
File "/home/userdir/git/kohya_ss/venv/lib/python3.10/site-packages/deepspeed/init.py", line 125, in initialize
engine = DeepSpeedEngine(args=args,
File "/home/userdir/git/kohya_ss/venv/lib/python3.10/site-packages/deepspeed/runtime/engine.py", line 340, in init
self._configure_optimizer(optimizer, model_parameters)
File "/home/userdir/git/kohya_ss/venv/lib/python3.10/site-packages/deepspeed/runtime/engine.py", line 1281, in _configure_optimizer
raise ZeRORuntimeException(msg)
deepspeed.runtime.zero.utils.ZeRORuntimeException: You are using ZeRO-Offload with a client provided optimizer (<class 'bitsandbytes.optim.adamw.AdamW8bit'>) which in most cases will yield poor performance. Please either use deepspeed.ops.adam.DeepSpeedCPUAdam or set an optimizer in your ds-config (https://www.deepspeed.ai/docs/config-json/#optimizer-parameters). If you really want to use a custom optimizer w. ZeRO-Offload and understand the performance impacts you can also set <"zero_force_ds_cpu_optimizer": false> in your configuration file.
ERROR:torch.distributed.elastic.multiprocessing.api:failed (exitcode: 1) local_rank: 0 (pid: 7130) of binary: /home/userdir/git/kohya_ss/venv/bin/python3`

changing optimizer ends in the same error

@Johnreidsilver
Copy link
Author

Seems like I was skipping the tools->folder preparation step

bmaltais pushed a commit that referenced this issue May 11, 2023
* fix pynoise

* Update custom_train_functions.py for default

* Update custom_train_functions.py for note

* Update custom_train_functions.py for default

* Revert "Update custom_train_functions.py for default"

This reverts commit ca79915d7396ddb57adbeb4b78bafb9a1a884b5c.

* Update custom_train_functions.py for default

* Revert "Update custom_train_functions.py for default"

This reverts commit 483577e137b13933ff24b6ae254f82c0a8d9f1fe.

* default value change
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant