-
-
Notifications
You must be signed in to change notification settings - Fork 333
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
vae_mnist increasing runtime and memory usage after each epoch #383
Comments
For debugging I did the following modifications:
As you can see in the log bellow, each call to Flux.pullback leads to additional memory allocations and the runtime also increases.
|
After some debugging, it seems the usage of L2 regularization in the model_loss function leads to this issue. If I comment out the regularization (simply set reg = 0), the issue no longer appears.
|
What concerns me more is that there seems to be compilation happening on every batch? FluxML/Flux.jl#2040 used more or less the same code as the old MNIST VAE model, so I don't know what could be the culprit. For the regularization part, this is a good opportunity to remove it from the loss function and use the |
Indeed, the issue is in this line: See bellow a MWE for the issue.
Here is the log:
|
Hello everyone, |
While training the vae_mnist example https://github.com/FluxML/model-zoo/blob/master/vision/vae_mnist/vae_mnist.jl, the runtime for each epoch is increasing from 4 minutes up to 1:22 hours, see the log. We expect a similar runtime for each epoch.
Tested with Flux 0.13.10, Windows 10 machine.
The text was updated successfully, but these errors were encountered: