-
Notifications
You must be signed in to change notification settings - Fork 18.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
cuda problem when training imagenet model #2809
Comments
When I run "make runtest", 16 tests failed. Will it be the reason of the above error ?
|
My configuration |
test case should be 100% passed. |
It seems that the 16 failed test cases will not cause the error like " CUBLAS_...", and why the error came out at iteration 40 rather than beginning |
looks like it's a out of memery crash. I0723 10:32:54.606806 11097 net.cpp:248] Memory required for data: 343607608 log shows that 300M allloc, and real usage perhapes 10X that , so how many vram do you have ? if it's 780Ti 3G, could you try nvidia-smi to confirm still vram available? |
Thanks. I'll check the memory usage. |
Error fixed with batch size 64, and vram consumed about 1.6GB. It may run out of memory when using batch size |
Note gradient accumulation #1977 for working with reduced memory. See the |
math_functions.cu:28] Check failed: status == CUBLAS_STATUS_SUCCESS (13 vs. 0) CUBLAS_STATUS_EXECUTION_FAILED The above error will occur when installing CUDA9.0 |
I got some error when training the imagenet model. I just followed the "ImageNet tutorial" step by step.
Is the problem with my cuda config or sth else?
The text was updated successfully, but these errors were encountered: