-
Notifications
You must be signed in to change notification settings - Fork 44
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Mesh/material memory leak #208
Comments
There is actually a memory leak inside server too, but at much smaller scale. Using the same procedure, here is a Heaptrack of Log file: heaptrack.ruby.2904453.gz |
Investigating further, the full issue occurs also for camera sensors totalling to ~5GB memory leaked (headless), and adding up to ~10GB leaked for 50 models if GUI is also opened. Therefore, the issue might be inside Log file: I updated the issue description to reflect this discovery. |
I have been digging a little bit more about this problem and I think I found the source. There are two places where we handle meshes
The first thing to take in account is that we don't have a method in the MeshManager Class to remove meshes, this will make imposible to delete some kind of memory I made a simple C++ example to load and destroy some meshes using only the MeshManager (I added a method to remove them, I will create the PR soon). And as you can see in the following image we are able to destroy the memory properly. The problem is inside ign-rendering, when we destroy the mesh using this new method there is no call to destroy the Ogre material which in this particular case will contain a texture image (quite big) which makes the memory to growth but we are not cleaning this up. In the following image we can see how we create the meshes and at the end I try to destroy then but we are only able to destroy the memory in the MeshManager but not in ign-rendering. TODO
FYI @iche033 |
Some more details: We have some duplicated materials. When we create the mesh and we load the material in memory in particular in this methos Then when we try to remove the mesh which potencially has some submeshes with textures we are going to remove the texture associated with the submesh but we are not removing this general material which will live in the memory forever or when we remove the engine with the method |
I added this PR in ign-gazebo gazebosim/gz-sim#824 that allows to remove the mesh when an entity is removed or the UserCommand remove is called. The following image shows how the memory is removed NOTE. This is using a world without any sensors. Which means the materials are not loaded, and there is no leak. |
I openned this other draft PR gazebosim/gz-rendering#324 With this one we should be able to remove the material from the memory there are still a memory leak which I'm not able to identify. ServerClientNOTE As you can see in both cases there is a small memory leak. |
The small memory leak is solved: ServerClientAccording with an offline discussion with @iche033, These changes may affect the performance in some special cases such us:
|
Awesome catch @AndrejOrsula! I also experienced in the past the same behavior in a similar setting, but without debugging it this much, I ended up to an implementation where the simulator is completely destroyed and re-created every time. This of course introduces a non-negligible computation overhead, but in my experience:
@ahcorde really a great work! Just to know, do the models you tested have plugins? |
I'm using the models from the script that @AndrejOrsula included in the issue (googleresearch). I don't think these models include any kind of plugins. |
Ok thanks for the clarification, I found the script only now. Again, nicely done 🎉 Very likely meshes have a much greater impact on memory than plugins. This fix made my day :) |
Thank you for investigating and mitigating this issue @ahcorde!
This is very true. From my experience, it is currently much faster to spawn a model with mesh geometry and image texture if its assets are already loaded to the memory. I don't have specific numbers, but it is especially noticeable for high-res textures (even if stored on SSD). This behaviour is definitely beneficial if a limited number of diverse models is used. Currently, the only problem that really occurs is when a system runs out of RAM. I forgot to mention in the original description that utilised VRAM is kept at a steady size just few MB below the available size (at least for CUDA). I assume it is due to some form of smart buffer poll management in OpenGL/OGRE? Having a similar solution for RAM would allow to keep the advantage of faster re-insertion of models while addressing the issue with lack of memory. I have no idea about the feasibility of such implementation, but having a maximum memory limit for assets (especially textures) would definitely be nice - with some policy that would free old and unused data [limit default to system memory]. |
Hello everyone
(I also saw some other problems, not related to Qt, but seems they are constant memory leak, not increasing) And on a Qt bugtracker there is a bug https://bugreports.qt.io/browse/QTBUG-119301 with quite similar reference to my observation with
Not sure how this observation can exactly help, but just in case I decide to share it. Tested on gz-gui7_7.2.2 tag. |
Maybe related to gazebosim/gz-rendering#39
After investigating this issue more, it is more suited to be inside https://github.com/ignitionrobotics/ign-rendering as it occurs for both GUI and camera sensors. Please move the issue there if possible.
Environment
Description
I should note that this behaviour might be advantageous for headless simulation if an environment repeatedly utilises a limited number of models, as reinsertion of a model is much faster if its resources already loaded to memory. Duality of bug/feature is real with this one. Therefore, having an option to allow both behaviours might be preferable, i.e option A - unload all resources after removing model, option B - keep the resources (with some policy that makes sure the system does not run out of memory/exceed some threshold).
Steps to reproduce
ign gazebo -s
ign gazebo -g
ign_gui_memory_leak_reproducibility_script.bash
(gist)ign_gui_memory_leak_reproducibility_script.bash
Service calls are used here to easy reproducibility. This issue occurs also when using C++ API directly (I originally experienced the issue while using gym-ignition).
Output
Below is a video of performing the steps above. Notice also that the aligned bounding boxes of objects remain visible if object was removed while selected (and they cannot be removed). Speculation: This might be the small negligible amount of memory that accumulates on model reinsertion for GUI (or part of it).
simplescreenrecorder-2021-04-03_19.32.30.mp4
I tried to investigate the issue with Heaptrack, however, only a fraction of the leaked memory gets logged (as far as I can see). Peak resident memory (RSS) matches the total RAM usage (4.8GB), but I was not able to figure out what the largest contributor is. The mesh/texture data is not logged. I am not sure if it's caused by having Ruby in the loop or because rendering engine is loaded as plugin?
Log file: heaptrack.ruby.3172402.gz
Summary:
Consumption (each spike is insertion of a new model):
Overlapping collision geometries
While making the reproducibility example, I also noticed that if I resume the simulation after all the model insertions/deletions, the server freezes completely and outputs a bunch of ODE collision-related messages. I have seen these before when two or more models have their collision geometry largely overlapping. Therefore, it seems the collision geometry is not immediately removed from server when the simulation is paused. Is this a design choice or a bug?
The text was updated successfully, but these errors were encountered: