-
Notifications
You must be signed in to change notification settings - Fork 263
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Out-of-memory on g2.8xlarge #34
Comments
I also get the same issue when trying to classify an image at very low memory load. My card is a GTX 660 Ti with 2 GB, but the memory usage when running into the error is only about 10%. System is Ubuntu 15.04 and i'm using the most recent version of DIGITS and nvidia's caffe fork. Both is compiled manuelly without the web installer. The error is the same as described above:
After getting the error, the model also disappears completely. It doesn't matter if it's trained to the end or not, it's just disappears from the model list, and i get 404 errors when trying to open it. Interesting might be also that when trying to classify an image during training, after getting the error, the model is not updated and listed in the web-interface anymore, but the GPU and CPU seems still to working on. So probably caffe is working further in the background. A log that appears all the time is also
but i don't now if it's related with the bug somehow. |
+1, get the same error
|
See here also: Oddly, his problem went away by upgrading to the latest Caffe and DIGITS. |
Seems to be solved for this guy with v0.14: |
See NVIDIA/DIGITS#310.
/cc @ajsander
Here is some information about his system:
Here's how I reproduced it:
g2.8xlarge
EC2 instanceThe big question
Why would we run out of memory during inference but not while training?
The text was updated successfully, but these errors were encountered: