-
Notifications
You must be signed in to change notification settings - Fork 8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
out of memory when using -map #8308
Comments
Found another way to replicate the exact same behavour. By running this command:
Which results in the following:
Watching it run with htop in another window, the problem only happens when it starts printing the "rank = ..." on the screen. Up to that point darknet was only consuming a few MB of ram. But whatever happens when it prints the "rank" message, it only takes a few seconds before all ram is consumed. |
Some findings using valgrind. May have some memory leaks: The
The
The
@AlexeyAB does this output from valgrind help? |
Facing the same issue:
From dmesg I can see this is a memory issue:
|
@AlexeyAB Can the memory leak fixes from PR #8314 be merged? But note that those memory leak fixes are not enough in my case to fix the problem. The |
same issue with you |
I have the same problem as you and I have your fix in my local darknet. Are you still facing the same issue when training with a large number of images? Thanks! |
Attempting to train a network with this command:
When it gets to calculating the map, the linux kernel eventually kills darknet due to out-of-memory.
Darknet log shows thousands of repeating lines before darknet is killed:
Neural network is yolov4-tiny with 180 classes. There are 6624 training images and 1810 validation images. Max batches is set to 360000.
The rig is a RTX2070 with 8GB, and the system has 32 GB of ram. At the time the linux kernel kills darknet,
dmesg
reports the following:So this is saying darknet is using 31.67 GB of ram, on a system with 32 GB installed.
Any idea on why that is or what I can do to fix it?
The text was updated successfully, but these errors were encountered: