[Bug]: Batch from directory within Extras mode does not clear the memory of finished generations, causing crash for large directories. #7517

IProduceWidgets · 2023-02-04T03:51:02Z

Is there an existing issue for this?

I have searched the existing issues and checked the recent builds/commits

What happened?

When using Batch from directory within Extras mode, if the input directory has too many images, it can cause out of memory errors due to the process not clearing the memory of images that have already finished processing.

Steps to reproduce the problem

Have a huge set of images in a directory (say 100 gb worth of 4k images)
run batch from directory within Extras Mode with Resize set to 1, GFPGAN visibility set to 1, "if png image larger than 4mb or any dimension is larger than 4000, downscale and save copy as jpg" off in setting, have show result images on or off.
open task manager and watch memory usage slowly climb until...

"Error completing request
Arguments: (2, None, None, 'C:\INPUT_DIRECTORY', 'C:\OUTPUTDIRECTORY', True, 0, 1, 512, 512, True, 'None', 'None', 0, 1, 0, 0) {}
Traceback (most recent call last):
File "C:\stable-diffusion-webui-master\stable-diffusion-webui\modules\call_queue.py", line 56, in f
res = list(func(*args, **kwargs))
File "C:\stable-diffusion-webui-master\stable-diffusion-webui\modules\call_queue.py", line 37, in f
res = func(*args, **kwargs)
File "C:\stable-diffusion-webui-master\stable-diffusion-webui\modules\postprocessing.py", line 56, in run_postprocessing
scripts.scripts_postproc.run(pp, args)
File "C:\stable-diffusion-webui-master\stable-diffusion-webui\modules\scripts_postprocessing.py", line 130, in run
script.process(pp, **process_args)
File "C:\stable-diffusion-webui-master\stable-diffusion-webui\scripts\postprocessing_gfpgan.py", line 26, in process
restored_img = gfpgan_model.gfpgan_fix_faces(np.array(pp.image, dtype=np.uint8))
File "C:\stable-diffusion-webui-master\stable-diffusion-webui\modules\gfpgan_model.py", line 61, in gfpgan_fix_faces
cropped_faces, restored_faces, gfpgan_output_bgr = model.enhance(np_image_bgr, has_aligned=False, only_center_face=False, paste_back=True)
File "C:\stable-diffusion-webui-master\stable-diffusion-webui\venv\lib\site-packages\torch\autograd\grad_mode.py", line 27, in decorate_context
return func(*args, **kwargs)
File "C:\stable-diffusion-webui-master\stable-diffusion-webui\venv\lib\site-packages\gfpgan\utils.py", line 145, in enhance
restored_img = self.face_helper.paste_faces_to_input_image(upsample_img=bg_img)
File "C:\stable-diffusion-webui-master\stable-diffusion-webui\venv\lib\site-packages\facexlib\utils\face_restoration_helper.py", line 355, in paste_faces_to_input_image
upsample_img = inv_soft_mask * pasted_face + (1 - inv_soft_mask) * upsample_img
numpy.core._exceptions._ArrayMemoryError: Unable to allocate 190. MiB for an array with shape (2160, 3840, 3) and data type float64"

What should have happened?

It should clear the memory of finished generations as generations finish.

Commit where the problem happens

7a14c8a

What platforms do you use to access the UI ?

Windows

What browsers do you use to access the UI ?

Mozilla Firefox

Command Line Arguments

No

List of extensions

No

Console logs

venv "C:\stable-diffusion-webui-master\stable-diffusion-webui\venv\Scripts\Python.exe"
Python 3.10.6 (tags/v3.10.6:9c7b4bd, Aug  1 2022, 21:53:49) [MSC v.1932 64 bit (AMD64)]
Commit hash: 7a14c8ab45da8a681792a6331d48a88dd684a0a9
Installing requirements for Web UI
Launching Web UI with arguments: --deepdanbooru --medvram --listen
No module 'xformers'. Proceeding without it.
==============================================================================
You are running torch 1.12.1+cu113.
The program is tested to work with torch 1.13.1.
To reinstall the desired version, run with commandline flag --reinstall-torch.
Beware that this will cause a lot of large files to be downloaded.
==============================================================================
Loading config from: C:\stable-diffusion-webui-master\stable-diffusion-webui\models\Stable-diffusion\final-pruned.yaml
LatentDiffusion: Running in eps-prediction mode
DiffusionWrapper has 859.52 M params.
Loading weights [89d59c3dde] from C:\stable-diffusion-webui-master\stable-diffusion-webui\models\Stable-diffusion\final-pruned.ckpt
Loading VAE weights found near the checkpoint: C:\stable-diffusion-webui-master\stable-diffusion-webui\models\Stable-diffusion\final-pruned.vae.pt
Applying cross attention optimization (Doggettx).
Textual inversion embeddings loaded(0):
Model loaded in 4.7s (0.5s create model, 4.2s load weights).
Running on local URL:  http://0.0.0.0:7860

To create a public link, set `share=True` in `launch()`.
Error completing request
Arguments: (2, None, None, 'C:\\DFL\\Test', 'C:\\DFL\\GanOut', True, 0, 1, 512, 512, True, 'None', 'None', 0, 1, 0, 0) {}
Traceback (most recent call last):
  File "C:\stable-diffusion-webui-master\stable-diffusion-webui\modules\call_queue.py", line 56, in f
    res = list(func(*args, **kwargs))
  File "C:\stable-diffusion-webui-master\stable-diffusion-webui\modules\call_queue.py", line 37, in f
    res = func(*args, **kwargs)
  File "C:\stable-diffusion-webui-master\stable-diffusion-webui\modules\postprocessing.py", line 56, in run_postprocessing
    scripts.scripts_postproc.run(pp, args)
  File "C:\stable-diffusion-webui-master\stable-diffusion-webui\modules\scripts_postprocessing.py", line 130, in run
    script.process(pp, **process_args)
  File "C:\stable-diffusion-webui-master\stable-diffusion-webui\scripts\postprocessing_gfpgan.py", line 26, in process
    restored_img = gfpgan_model.gfpgan_fix_faces(np.array(pp.image, dtype=np.uint8))
  File "C:\stable-diffusion-webui-master\stable-diffusion-webui\modules\gfpgan_model.py", line 61, in gfpgan_fix_faces
    cropped_faces, restored_faces, gfpgan_output_bgr = model.enhance(np_image_bgr, has_aligned=False, only_center_face=False, paste_back=True)
  File "C:\stable-diffusion-webui-master\stable-diffusion-webui\venv\lib\site-packages\torch\autograd\grad_mode.py", line 27, in decorate_context
    return func(*args, **kwargs)
  File "C:\stable-diffusion-webui-master\stable-diffusion-webui\venv\lib\site-packages\gfpgan\utils.py", line 145, in enhance
    restored_img = self.face_helper.paste_faces_to_input_image(upsample_img=bg_img)
  File "C:\stable-diffusion-webui-master\stable-diffusion-webui\venv\lib\site-packages\facexlib\utils\face_restoration_helper.py", line 355, in paste_faces_to_input_image
    upsample_img = inv_soft_mask * pasted_face + (1 - inv_soft_mask) * upsample_img
numpy.core._exceptions._ArrayMemoryError: Unable to allocate 190. MiB for an array with shape (2160, 3840, 3) and data type float64

Additional information

No response

The text was updated successfully, but these errors were encountered:

grrrwaaa · 2023-06-16T20:51:26Z

Ping. This still seems to be an issue when processing large folders of images -- something in the process is not releasing memory between batch steps.
E.g. using R-ESRGAN 4x+ Anime6B, upscaling by 6x from a folder of identically sized 1024x1024 images, it produces 5 images in the series, but the 6th image triggers the torch.cuda.OutOfMemoryError as above.
This is on fresh git clone of webui, using RTX 3070, Nvidia driver 532.03.

Any pointer to what script & where I could modify to work around this would be very much appreciated.

EDIT: is it just a case of adding a devices.torch_gc() in the right place in run_postprocessing.py ?

grrrwaaa · 2023-06-16T21:15:03Z

... ok it looks like that is the trick. In postprocessing.py, around line 80, there's a devices.torch_gc() call. At the moment this call only happens once all the batch loops are complete.
If this call is indented (so that it is inside the batch process for loop), I don't seem to get the out of memory errors anymore.

I'd do a pull request but really it's just adding one tab character, a pull request seems overkill for that.

catboxanon · 2023-08-12T06:02:42Z

Fixed in #12479

IProduceWidgets added the bug-report Report of a bug, yet to be confirmed label Feb 4, 2023

IProduceWidgets changed the title ~~[Bug]:~~ [Bug]: Batch from directory within Extras mode does not clear the memory of finished generations, causing crash for large directories. Feb 4, 2023

catboxanon mentioned this issue Aug 11, 2023

Refactor postprocessing/extras tab to use generator to resolve OOM issues #12479

Merged

4 tasks

catboxanon added bug Report of a confirmed bug and removed bug-report Report of a bug, yet to be confirmed labels Aug 11, 2023

catboxanon linked a pull request Aug 11, 2023 that will close this issue

Refactor postprocessing/extras tab to use generator to resolve OOM issues #12479

Merged

4 tasks

catboxanon closed this as completed Aug 12, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Bug]: Batch from directory within Extras mode does not clear the memory of finished generations, causing crash for large directories. #7517

[Bug]: Batch from directory within Extras mode does not clear the memory of finished generations, causing crash for large directories. #7517

IProduceWidgets commented Feb 4, 2023

grrrwaaa commented Jun 16, 2023 •

edited

Loading

grrrwaaa commented Jun 16, 2023

catboxanon commented Aug 12, 2023

[Bug]: Batch from directory within Extras mode does not clear the memory of finished generations, causing crash for large directories. #7517

[Bug]: Batch from directory within Extras mode does not clear the memory of finished generations, causing crash for large directories. #7517

Comments

IProduceWidgets commented Feb 4, 2023

Is there an existing issue for this?

What happened?

Steps to reproduce the problem

What should have happened?

Commit where the problem happens

What platforms do you use to access the UI ?

What browsers do you use to access the UI ?

Command Line Arguments

List of extensions

Console logs

Additional information

grrrwaaa commented Jun 16, 2023 • edited Loading

grrrwaaa commented Jun 16, 2023

catboxanon commented Aug 12, 2023

grrrwaaa commented Jun 16, 2023 •

edited

Loading