[Bug] [Memory leak] Whisper does not release ram once it is no longer used. #605

boludoz · 2022-11-28T18:09:47Z

boludoz
Nov 28, 2022

Whisper does not release ram once it is no longer used.

A large amount of ram used by google colab persists, preventing it from running again without closing the script.

boludoz · 2022-11-28T18:14:48Z

boludoz
Nov 28, 2022
Author

This is how it would normally look without running, I think it's a memory leak.

1 reply

marvin-hris Apr 5, 2023

@boludoz please update this label as bug

boludoz · 2022-11-28T18:15:26Z

boludoz
Nov 28, 2022
Author

import torch, gc
gc.collect()
torch.cuda.empty_cache()

Not works

3 replies

boludoz Nov 29, 2022
Author

How can I help to solve the problem.

arabcoders Dec 1, 2022

I usually this this to free memory

import whisper

model = whisper.load_model("large")

del model.encoder
del model.decoder
torch.cuda.empty_cache()

josedandrade Jul 29, 2023

This worked. Had to try it. I am running whisper and whisperx in the same colab. The latter has no memory leak.

styxpilled · 2022-12-19T16:01:02Z

styxpilled
Dec 19, 2022

Has anyone found a solution to this? I'm trying to run Whisper in a Flask app as part of my monorepo, but after loading it on a request, it stays in memory. I've tried putting in in a process, deling it, setting the model to None, running torch.cuda.empty.cache(), nothing works.

1 reply

light42 Dec 22, 2022

I don't know if this could help, but I also faced the same problem with face_recognition library.
And my way to reduce the leak is to run the api on multiple threads using gunicorn.

Please let me know if it works, since I also have whisper api.

Evan-Zhao · 2023-01-29T08:50:25Z

Evan-Zhao
Jan 29, 2023

Hi folks, just in case anyone needs this, my trick is to move the model to cpu before del-ing it. So for example:

model = load_model(...)
# ...
model.cpu()
del model
torch.cuda.empty_cache()

This specific combination worked for me, and particularly the model.cpu() line makes it work.

Since I never needed to do this with plain pytorch model, hopefully people can figure out what exactly is wrong in the Whisper model.

1 reply

marvin-hris Mar 30, 2023

this is not works on my container,
before 375, processing 1.6GB, Finished 727MB

but after thy your code
before running memory usage is 375MB, Processing 1.6GB, finished 1.1G

here some image on portainer

anyone please help us to solve this

boludoz · 2023-04-05T13:31:23Z

boludoz
Apr 5, 2023
Author

it doesn't work, for example in the case of google colab it will cause the virtual machine to fail, it is a not very canonical and aggressive solution.

0 replies

ytl-liva · 2023-05-10T06:53:03Z

ytl-liva
May 10, 2023

hi, is there updates on this? the suggested del is not working

0 replies

kayegarret · 2023-07-04T16:10:20Z

kayegarret
Jul 4, 2023

So, I tried all of suggested approaches to this issue mentioned in this thread, but was not able to get any of them to work, sadly. However, I did manage to come up with a workaround I thought I would share using the native multiprocessing module by forking (spawning should work too) a new child process from the parent process and then performing the whisper transcription in the child process. When the child process is complete, it is then terminated and releases any memory allocations that happened under it. Here's an example I pieced together from my Flask app:

"""app.py (entrypoint)"""
from multiprocessing import Process, Manager, set_start_method
from flask import Flask, request, jsonify
import whisper

if __name__ == 'app':
    set_start_method("fork")  # Sets the start method for child processes
    app = Flask(__name__)


@app.route("/transcribe_test", methods=["GET"])
def transcribe_test():

    # Set up a process manager
    process_manager = Manager()
    shared_data = process_manager.dict()
    
    # Create child process for whisper execution
    p = Process(target=_transcribe_test, args=(shared_data, "result"))
    
    p.start()      # run child process
    p.join()       # wait for the process to complete
    p.terminate()  # terminate the process (optional - should happen automatically)

    # Get transcription from shared data
    transcription = shared_data["result"]
    
    # Return transcription result
    return jsonify(transcription)


def _transcribe_test(shared_data: dict, rid: str):

    # Transcribe with Whisper
    model = whisper.load_model(name="medium.pt")
    transcription = model.transcribe(audio="test.wav")
    
    # Store transcription in shared data
    shared_data[rid] = transcription

As I mentioned before, this is a workaround and doesn't actually deal with the memory leak directly, but it can overcome the issue in most cases I think if you don't mind a little overhead from spawning or forking child processes. It's most unfortunate there's no documentation on making consecutive transcription calls with the Whisper model, or at least none that I've been able to find, and seemingly no one else for that matter.

Hope this helps, though.

4 replies

marvin-hris Jul 12, 2023

Hellow @kayegarret thanks for you kind of effort to try solve this problem.
i have tried your code above, but i got the same result, the memory isn't release.

here's my screenshoot SS

kayegarret Jul 12, 2023

Hey Marvin, so I have started to observe the same thing you have after my initial post. For some reason, after the very first call, all the memory is released, but after the second call, some memory persists. I'm not quite sure what is causing this; I'm a bit of beginner/intermediate when it comes to Python, but I feel like this could provide a clue as to the culprit of this problem for anyone that understands memory allocation in Python in detail (hint hint any input would be appreciated for/from future viewers). It seems, though, something from either the PyTorch library or Whisper library is still allocating memory under the parent process rather than the child process.

That said, this approach still works for my purposes/environment because, after the memory accrual from the second call, there are no more substantial memory leaks after subsequent calls - my resting memory stays the about same (about 1.8GB). This was not the case when I was trying the other methods on this thread. From your container memory profiler screenshot, it seems your using Whisper's tiny model or small model, so maybe it's not as apparent as running something like the medium model (this is the model I am using) which requires 5GB of VRAM, so before my memory accrual would be close to that 5GB figure after every call.

Wish I had a better answer for you and the rest of the folks in this thread, but this is pretty much the best I can do at the moment. The only other quick-fix test I can think of trying would be to remove the set_start_method("fork") line. This would cause the multiprocessing module to use the default process creation method which is "spawn" I believe. When this is set to "fork", the context from where a child process is created in the parent process is preserved and accessible from within the child process (e.g. global variables and any other variables in the scope). This is not the case with "spawn" where no context from the parent process is accessible from the child process unless passed through the args param. So, perhaps forking could play a role in the memory leak, I am not quite sure. I had set my environment to forking for convenience purposes mostly, but it also performs child process creations a bit faster as well as I understand, and since it was the one I tested it was the one I posted. Perhaps though you may try spawn instead? Let me know how it goes?

casic Jul 14, 2023

Hi folks. I'm using this from 2 weeks , and now when I am ready with construction, and running Whisper API for 2 days with small 1 minute audio , consequently , i see memory leak. ever one hour /about 120 requests whit 120 files 1 minute every / memory used rising with 0.5 GB. I have 78 GB RAM and 48 hours work, they gone. I have loaded model whit start API , and delete results after every request , but still has leak. I think Whisper model reserve memory and from Python API we can nothing to do :(

marvin-hris Aug 1, 2023

Hello @kayegarret i've been Test Again your way to solve memory leaks.
And it's Work
I'm using Docker Container with Flask & Gunicorn inside it. And the way i test is editing file in the runnig container and its work.

Memory leak is not my problem anymore, but i'm stil wondering why my network I/O on my container doesn't decrease after success request

thanks for that

additional option for detailed resource usage monitoring i used lazydocker OR Glances installed on my container (for development purpose only)

here my SS

seymurOps · 2024-02-07T10:01:25Z

seymurOps
Feb 7, 2024

Try PYTORCH_CUDA_ALLOC_CONF env in your docker compose file for reduce memory fragmentation.

PYTORCH_CUDA_ALLOC_CONF: 'max_split_size_mb:512'

0 replies

Brijeshtanwar · 2024-08-12T11:14:40Z

Brijeshtanwar
Aug 12, 2024

Using it in as a differnet process releases the moemory. i tried threadpoolexecutor

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Bug] [Memory leak] Whisper does not release ram once it is no longer used. #605

{{title}}

Replies: 9 comments 10 replies

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

Select a reply

[Bug] [Memory leak] Whisper does not release ram once it is no longer used. #605

Replies: 9 comments · 10 replies

boludoz Nov 28, 2022 Author

boludoz Nov 28, 2022 Author

boludoz Nov 29, 2022 Author

boludoz Apr 5, 2023 Author

Replies: 9 comments 10 replies

boludoz
Nov 28, 2022
Author

boludoz
Nov 28, 2022
Author

boludoz Nov 29, 2022
Author

boludoz
Apr 5, 2023
Author