deepspeed.runtime.zero.utils.ZeRORuntimeException #489

Johnreidsilver · 2023-03-31T10:41:55Z

Hi, thanks for the effort on this program to run fine-tunning on lower VRAM cards.

I'm looking for help to make it run on my laptop with a 3060 Max-Q 6GB VRAM, I just installed it and perhaps I misconfigured something as it crashes just as nvtop shows VRAM at 25% and GPU shooting up, so it's not lack of VRAM

`
Replace CrossAttention.forward to use xformers
[Dataset 0]
caching latents.
100%|████████████████████████████████████████████████████████████████████████████████████████| 3/3 [00:01<00:00, 2.60it/s]
import network module: networks.lora
create LoRA network. base dim (rank): 8, alpha: 1.0
create LoRA for Text Encoder: 72 modules.
create LoRA for U-Net: 192 modules.
enable LoRA for text encoder
enable LoRA for U-Net
prepare optimizer, data loader etc.

===================================BUG REPORT===================================
Welcome to bitsandbytes. For bug reports, please submit your error trace to: https://github.com/TimDettmers/bitsandbytes/issues
For effortless bug reporting copy-paste your error into this form: https://docs.google.com/forms/d/e/1FAIpQLScPB8emS3Thkp66nvqwmjTEgxp8Y9ufuWTzFyr9kJ5AoI47dQ/viewform?usp=sf_link

/home/userdir/git/kohya_ss/venv/lib/python3.10/site-packages/bitsandbytes/cuda_setup/paths.py:27: UserWarning: WARNING: The following directories listed in your path were found to be non-existent: {PosixPath('/home/userdir/git/kohya_ss/venv/lib/python3.10/site-packages/cv2/../../lib64')}
warn(
CUDA SETUP: CUDA runtime path found: /usr/local/cuda-11.7/targets/x86_64-linux/lib/libcudart.so
CUDA SETUP: Highest compute capability among GPUs detected: 8.6
CUDA SETUP: Detected CUDA version 117
CUDA SETUP: Loading binary /home/userdir/git/kohya_ss/venv/lib/python3.10/site-packages/bitsandbytes/libbitsandbytes_cuda117.so...
use 8-bit AdamW optimizer | {}
[2023-03-31 11:32:01,014] [INFO] [logging.py:93:log_dist] [Rank 0] DeepSpeed info: version=0.8.3, git-hash=unknown, git-branch=unknown
[2023-03-31 11:32:01,180] [INFO] [logging.py:93:log_dist] [Rank 0] DeepSpeed Flops Profiler Enabled: False
[2023-03-31 11:32:01,181] [INFO] [logging.py:93:log_dist] [Rank 0] Removing param_group that has no 'params' in the client Optimizer
[2023-03-31 11:32:01,182] [INFO] [logging.py:93:log_dist] [Rank 0] Using client Optimizer as basic optimizer
Traceback (most recent call last):
File "/home/userdir/git/kohya_ss/train_network.py", line 711, in
train(args)
File "/home/userdir/git/kohya_ss/train_network.py", line 224, in train
unet, text_encoder, network, optimizer, train_dataloader, lr_scheduler = accelerator.prepare(
File "/home/userdir/git/kohya_ss/venv/lib/python3.10/site-packages/accelerate/accelerator.py", line 872, in prepare
result = self._prepare_deepspeed(*args)
File "/home/userdir/git/kohya_ss/venv/lib/python3.10/site-packages/accelerate/accelerator.py", line 1093, in _prepare_deepspeed
engine, optimizer, _, lr_scheduler = deepspeed.initialize(**kwargs)
File "/home/userdir/git/kohya_ss/venv/lib/python3.10/site-packages/deepspeed/init.py", line 125, in initialize
engine = DeepSpeedEngine(args=args,
File "/home/userdir/git/kohya_ss/venv/lib/python3.10/site-packages/deepspeed/runtime/engine.py", line 340, in init
self._configure_optimizer(optimizer, model_parameters)
File "/home/userdir/git/kohya_ss/venv/lib/python3.10/site-packages/deepspeed/runtime/engine.py", line 1281, in _configure_optimizer
raise ZeRORuntimeException(msg)
deepspeed.runtime.zero.utils.ZeRORuntimeException: You are using ZeRO-Offload with a client provided optimizer (<class 'bitsandbytes.optim.adamw.AdamW8bit'>) which in most cases will yield poor performance. Please either use deepspeed.ops.adam.DeepSpeedCPUAdam or set an optimizer in your ds-config (https://www.deepspeed.ai/docs/config-json/#optimizer-parameters). If you really want to use a custom optimizer w. ZeRO-Offload and understand the performance impacts you can also set <"zero_force_ds_cpu_optimizer": false> in your configuration file.
ERROR:torch.distributed.elastic.multiprocessing.api:failed (exitcode: 1) local_rank: 0 (pid: 7130) of binary: /home/userdir/git/kohya_ss/venv/bin/python3`

changing optimizer ends in the same error

Johnreidsilver · 2023-03-31T11:30:28Z

Seems like I was skipping the tools->folder preparation step

* fix pynoise * Update custom_train_functions.py for default * Update custom_train_functions.py for note * Update custom_train_functions.py for default * Revert "Update custom_train_functions.py for default" This reverts commit ca79915d7396ddb57adbeb4b78bafb9a1a884b5c. * Update custom_train_functions.py for default * Revert "Update custom_train_functions.py for default" This reverts commit 483577e137b13933ff24b6ae254f82c0a8d9f1fe. * default value change

Johnreidsilver closed this as completed Mar 31, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

deepspeed.runtime.zero.utils.ZeRORuntimeException #489

deepspeed.runtime.zero.utils.ZeRORuntimeException #489

Johnreidsilver commented Mar 31, 2023

Johnreidsilver commented Mar 31, 2023

deepspeed.runtime.zero.utils.ZeRORuntimeException #489

deepspeed.runtime.zero.utils.ZeRORuntimeException #489

Comments

Johnreidsilver commented Mar 31, 2023

Johnreidsilver commented Mar 31, 2023