-
-
Notifications
You must be signed in to change notification settings - Fork 986
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Memory leak using TraceEnum_ELBO #3068
Comments
Thank you for the quick attempt, but no it does not fix the problem. Neither the one without using |
Thanks for checking @gioelelm. I might have time in the next few weeks to dive deeper. If you have time I can recommend some strategies (what I'd try):
|
Ok thanks! I will try the first two. Regarding the last point, since the code runs successfully anyways (provided the machine has enough memory). Don't you think that the bug could have gone unnoticed? Or you have some reason to exclude that. I am thinking at the fact that on one would have had to profile the memory usage of the program to figure that there was a problem. |
It could have, but |
I have noticed a major GPU memory leak as well switching from PyTorch 1.10 to 1.11. Wasn't able to debug it and decided to stick to PyTorch 1.10.0 (and Pyro 1.8.0) for now. Edit: CUDA 11.6, Arch Linux |
Hmm maybe we should relax pytorch requirements and release to allow Pyro 1.8.2 to work with PyTorch 1.10. We'd need to do the same with Funsor. I think I was a little too eager dropping PyTorch 1.10 support, especially given colab still uses 1.10. |
I have noticed a GPU memory leaks too with Pyro 1.8.1+06911dc and PyTorch 1.11.0. Downgrade to Pyro 1.6.0 and PyTorch 1.8.0 works normally. |
I noticed a major memory leak when training SVI using
TraceEnum_ELBO
.I initially noticed this in a custom model we are developing but then I found it seems a more general bug.
For example, it affects even the Pyro tutorials GMM example here. Where memory usage rapidly goes from a couple of hundred MBs to a many GBs very quickly!
I have run this Macbook Pro 2019 running MacOS 10.15. To replicate the issue is enough running the notebook linked.
I have tried to comment out the following lines and add a garbage collector call, that reduces the entity of the memory accumulation of one order of magnitude but does not solve the problem completely, which becomes particularly severe for large datasets.
(from this forum post)
The text was updated successfully, but these errors were encountered: