Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Workaround for ogre crash on shutdown #1033

Merged
merged 15 commits into from
Aug 16, 2024
Merged

Workaround for ogre crash on shutdown #1033

merged 15 commits into from
Aug 16, 2024

Conversation

iche033
Copy link
Contributor

@iche033 iche033 commented Aug 9, 2024

🦟 Bug fix

Workaround for #1007

Summary

The INTEGRATION_load_unload test crashes with ogre 1.x. The tests checks to see if the render engine can be loaded and unloaded in a thread (which is how gz sim runs the render engine). The crash happens on ubuntu 24.04 with the system debs, see #1007 for more info.

The workaround implemented in this PR is to prevent the RenderSystem_GL plugin library from being unloaded on shutdown to avoid the crash. The plugin will be still be stopped and deleted but we do not call dlclose on it. I verified that the destructors of various objects / singletons in the plugin are called.

This is not ideal and we should find a proper solution in the future.

To test

The INTEGRATION_load_unload test should now pass with ogre 1.x on Ubuntu 24.04 with the system ogre debs:

GZ_ENGINE_TO_TEST=ogre  ./build/gz-rendering9/bin/INTEGRATION_load_unload

gz-sim should no longer crash on exit:

gz sim -v 4 -r --iterations 5 -s sensors_demo.sdf --render-engine ogre

Checklist

  • Signed all commits for DCO
  • Added tests
  • Updated documentation (as needed)
  • Updated migration guide (as needed)
  • Consider updating Python bindings (if the library has them)
  • codecheck passed (See contributing)
  • All tests passed (See test coverage)
  • While waiting for a review on your PR, please help review another open pull request to support the maintainers

Note to maintainers: Remember to use Squash-Merge and edit the commit message to match the pull request summary while retaining Signed-off-by messages.

Signed-off-by: Ian Chen <ichen@openrobotics.org>
@github-actions github-actions bot added the 🏛️ ionic Gazebo Ionic label Aug 9, 2024
Signed-off-by: Ian Chen <ichen@openrobotics.org>
@iche033 iche033 marked this pull request as draft August 10, 2024 02:11
Signed-off-by: Ian Chen <ichen@openrobotics.org>
@iche033 iche033 marked this pull request as ready for review August 10, 2024 05:02
@azeey azeey self-requested a review August 12, 2024 18:10
@scpeters
Copy link
Member

potential fix for the remaining cmake warning in gazebo-tooling/release-tools#1161

@scpeters
Copy link
Member

potential fix for the remaining cmake warning in gazebo-tooling/release-tools#1161

testing shows that gazebo-tooling/release-tools#1161 should fix the warning

Copy link
Member

@scpeters scpeters left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

minor nit; I'm still trying to understand the code

ogre/src/OgreRenderEngine.cc Outdated Show resolved Hide resolved
@azeey
Copy link
Contributor

azeey commented Aug 14, 2024

We now have API to load plugins with RTLD_NODELETE. This would ensure that the plugin is not actually deleted even if dlclose is called. Would changing

auto pluginNames = this->pluginLoader.LoadLib(pathToLib);
to LoadLib(pathToLib, true) fix this issue without any of the other changes?

ogre/src/OgreRenderEngine.cc Outdated Show resolved Hide resolved
ogre/src/OgreRenderEngine.cc Show resolved Hide resolved
ogre/src/OgreRenderEngine.cc Outdated Show resolved Hide resolved
@scpeters
Copy link
Member

We now have API to load plugins with RTLD_NODELETE. This would ensure that the plugin is not actually deleted even if dlclose is called. Would changing

auto pluginNames = this->pluginLoader.LoadLib(pathToLib);

to LoadLib(pathToLib, true) fix this issue without any of the other changes?

testing in a draft PR: #1038

@scpeters
Copy link
Member

We now have API to load plugins with RTLD_NODELETE. This would ensure that the plugin is not actually deleted even if dlclose is called. Would changing

auto pluginNames = this->pluginLoader.LoadLib(pathToLib);

to LoadLib(pathToLib, true) fix this issue without any of the other changes?

testing in a draft PR: #1038

the noble workflow passes; I've marked #1038 as ready for review

Signed-off-by: Ian Chen <ichen@openrobotics.org>
Signed-off-by: Ian Chen <ichen@openrobotics.org>
@iche033
Copy link
Contributor Author

iche033 commented Aug 14, 2024

I think the LoadLib with RTLD_NODELETE approach keeps the gz-rendering engine plugin in memory while the changes in this PR will keep only the RenderSystem_GL ogre plugin in memory. I'm leaning towards keeping the current approach of limiting the workaround to only the GL render system plugin that's causing the issue.

I simplified the code in this PR quite a bit in 2910326. Now I manually call dlopen instead of using ogre's dynlib manager to load the plugin. The difference is that I no longer need to add the hook to control how the libraries are unloaded. All objects should still be deleted.

Signed-off-by: Ian Chen <ichen@openrobotics.org>
Signed-off-by: Ian Chen <ichen@openrobotics.org>
Signed-off-by: Ian Chen <ichen@openrobotics.org>
Signed-off-by: Ian Chen <ichen@openrobotics.org>
Copy link

codecov bot commented Aug 15, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 75.69%. Comparing base (e187f52) to head (26425b8).
Report is 51 commits behind head on main.

Additional details and impacted files
@@           Coverage Diff           @@
##             main    #1033   +/-   ##
=======================================
  Coverage   75.68%   75.69%           
=======================================
  Files         177      177           
  Lines       16959    16956    -3     
=======================================
- Hits        12835    12834    -1     
+ Misses       4124     4122    -2     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Signed-off-by: Ian Chen <ichen@openrobotics.org>
Signed-off-by: Ian Chen <ichen@openrobotics.org>
Copy link
Member

@scpeters scpeters left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this has fixed CI and looks fine to me; I defer to @azeey about whether to follow the suggestion from @iche033 to use this instead of #1038

Copy link
Contributor

@azeey azeey left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was going to check if windows CI was okay, but it looks like the Ogre1 tests have been failing for a while. I thought it was just Ogre2 that was failing :(. Anyway, I'm good with this approach.

@iche033
Copy link
Contributor Author

iche033 commented Aug 15, 2024

yeah we should try setting up ogre 2.3 on windows, #745

@iche033 iche033 merged commit 2acca4d into main Aug 16, 2024
8 of 9 checks passed
@iche033 iche033 deleted the load_unload_ogre_crash branch August 16, 2024 00:24
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
🏛️ ionic Gazebo Ionic
Projects
Archived in project
Development

Successfully merging this pull request may close these issues.

3 participants