Replies: 9 comments 10 replies
-
|
Beta Was this translation helpful? Give feedback.
-
Not works |
Beta Was this translation helpful? Give feedback.
-
Has anyone found a solution to this? I'm trying to run Whisper in a Flask app as part of my monorepo, but after loading it on a request, it stays in memory. I've tried putting in in a process, |
Beta Was this translation helpful? Give feedback.
-
Hi folks, just in case anyone needs this, my trick is to move the model to cpu before
This specific combination worked for me, and particularly the Since I never needed to do this with plain pytorch model, hopefully people can figure out what exactly is wrong in the Whisper model. |
Beta Was this translation helpful? Give feedback.
-
it doesn't work, for example in the case of google colab it will cause the virtual machine to fail, it is a not very canonical and aggressive solution. |
Beta Was this translation helpful? Give feedback.
-
hi, is there updates on this? the suggested del is not working |
Beta Was this translation helpful? Give feedback.
-
So, I tried all of suggested approaches to this issue mentioned in this thread, but was not able to get any of them to work, sadly. However, I did manage to come up with a workaround I thought I would share using the native
As I mentioned before, this is a workaround and doesn't actually deal with the memory leak directly, but it can overcome the issue in most cases I think if you don't mind a little overhead from spawning or forking child processes. It's most unfortunate there's no documentation on making consecutive transcription calls with the Whisper model, or at least none that I've been able to find, and seemingly no one else for that matter. Hope this helps, though. |
Beta Was this translation helpful? Give feedback.
-
Try PYTORCH_CUDA_ALLOC_CONF env in your docker compose file for reduce memory fragmentation. PYTORCH_CUDA_ALLOC_CONF: 'max_split_size_mb:512' |
Beta Was this translation helpful? Give feedback.
-
Using it in as a differnet process releases the moemory. i tried threadpoolexecutor |
Beta Was this translation helpful? Give feedback.
-
Whisper does not release ram once it is no longer used.
A large amount of ram used by google colab persists, preventing it from running again without closing the script.
Beta Was this translation helpful? Give feedback.
All reactions