-
Notifications
You must be signed in to change notification settings - Fork 51
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Race condition between Qt and Ogre2 in presentation #304
Comments
thanks for uncovering this bug. Note that there is also a Scene3D.cc in ign-gazebo and that's the one currently being used when |
This fix depends on a fix in ign-rendering module, because it depends on the new Camera::SwapFromThread function Without it, compilation will fail Affects gazebosim/gz-rendering#304 Signed-off-by: Matias N. Goldberg <dark_sylinc@yahoo.com.ar>
This fix also depends on on a fix in ign-gazebo so that it now can call SwapFromThread, otherwise the race condition will still happen. At the current moment, the implementation is very wasteful creating many more RenderTargets than needed (because 2 workspaces are created). This issue will be fixed in the next commit Affects gazebosim#304 Signed-off-by: Matias N. Goldberg <dark_sylinc@yahoo.com.ar>
This fix also depends on on a fix in ign-gazebo so that it now can call SwapFromThread, otherwise the race condition will still happen. At the current moment, the implementation is very wasteful creating many more RenderTargets than needed (because 2 workspaces are created). This issue will be fixed in the next commit Affects gazebosim#304 Signed-off-by: Matias N. Goldberg <dark_sylinc@yahoo.com.ar>
Ooofff!! This is going to be one complex merge. My fixes are still WIP but still I submitted the PR. I'm mostly interested in hearing feedback about PR gazebosim/gz-sim#774 because I know for certain that ign-rendering is currently breaking the ABI and I know how to fix it (I just have to move SwapFromThread to the derived classes and then have ign-gazebo dynamic cast these classes and call them directly) However I'm more inexperienced about ign-gazebo. I checked out the install folder generated by the make folders, and I could not find Scene.hh anywhere. This suggests the classes I touched were never exposed to the user and therefore it is safe to alter them. However I may be wrong. I'll be waiting on feedback for this. Update: It seems the tests are failing on ign-rendering PR, that was unexpected but I can see why I missed it. I made some local changes to trigger the race condition more aggressively and forgot to test the code without these changes. The module is basically crashing because we're attempting to create the same node definition twice, which should not be happening. |
This fix depends on a fix in ign-rendering module, because it depends on the new Camera::SwapFromThread function Without it, compilation will fail Affects gazebosim/gz-rendering#304 Signed-off-by: Matias N. Goldberg <dark_sylinc@yahoo.com.ar>
ign-rendering's ogre2 rendering engine is currently dynamically loaded as a plugin library from ign-gazebo. If a downcast is needed in ign-gazebo, it means we'll need to explicitly link against the ogre2 target, correct? It's doable but that means we'll need to add a few cmake checks and ifdefs to make sure the user has ogre 2.x installed (it's not required since users can use ign-gazebo with just ogre 1.x). I'm leaning towards just merging this into the unstable branch (
Do you mean ign-gazebo's Scene3D.hh (as opposed to ign-rendering's Scene.hh)?
|
Mmm. You are right. It will be troublesome to backport. To put another way: We just need a way to signal that a particular camera needs to do special work to swap its internal state. If adding forcing the user to upgrade both ign-gazebo and ign-rendering at the same time is an option, then this can be fixed by adding a new class to ign-rendering which takes care of the downcasting. This will preserve ABI compatibility, but will cause trouble if the user upgrades ign-gazebo without upgrading ign-rendering (which could be detected at runtime because the symbol will be missing, and warn the user to upgrade ign-rendering as well). I'm talking about something along the lines:
Is this even feasible?
As for how affects my other bugfixes: The current code is bugged and can corrupt GL state (apparently in non-crashing ways at least with what's we've seen so far with Mesa drivers). That means that:
Basically whenever we encounter a bug, it's going to be the question: "is this code broken because I made a mistake, or just the race condition again?".
Yes, sorry. I meant Scene3D.hh. Great that it's not installed, as I suspected |
yes the example you gave is feasible. We've actually done something similar before in other libraries for critical bugfixes. We can also require users to upgrade ign-rendering dependency version when they upgrade ign-gazebo. Currently ign-gazebo and ign-rendering are on version 5.0.0. Once the changes get in, we can bump both versions to 5.1.0 and tell ign-gazebo to look for ign-rednering 5.1.0 by add Just throwing in another idea. Is it possible to do this through ign-gazebo's Scene3D plugin by using 2 cameras that take turn to render? I imagine the difficulty here is keeping the 2 camera pose control (pan, orbit, zoom, etc) in sync. |
That may not be a bad idea. I'll have to think a few details about on how feasible are other improvements I have to do (basically saving VRAM) |
Hi! I've been thinking about this and I realized I missed a simple element, which I couldn't discuss in the middle the weekend and wait for a reply, but NOW is the right time to discuss this: Do we even need multithreading here?I assume rendering is currently happening in a different thread because "that's what the sample we found in the Qt wiki did". There are pros and cons of this approach: Pros
Cons
Best of both worlds?I recommend to keep rendering in another thread to avoid the cryptic GL errors, but serializing the operations. That means the UI thread is blocked until Ogre is done. I've done this successfully in order to test RenderDoc debugging and it worked.
It actually gets better, because newer versions at any time can add support for a double buffer scheme on-demand, where at launch-time the user can leverage whether to enable parallel rendering, sacrificing some VRAM for a potential increase in UI responsiveness, or disable parallel rendering. I was aiming to do parallel rendering by default, but add a command line argument to request serialized rendering for RenderDoc debugging. Thoughts? |
I'm also leaning towards doing serialized rendering by default and keeping the code less complicated (I hope). This sounds like the changes will mainly be done in Scene3D class in the way we integrate ogre3d with Qt. And yes we did parallel rendering as that was what worked at the time, so it's good to hear that you were already able to have something working already. As for parallel rendering, I think the VRAM sacrifice is relatively small compared to some our typical use cases. But I agree we can leave this for the future if we find limitations in serialized rendering, thanks. |
This avoids us to break ign-rendering ABI while also simplifying the amount of work to be done Serializing work is easier to maintain and debug Only CPU-bound scenario would potentially benefit from parallel command generation (in terms of UI responsiveness) Parallel command generation can be added back later Also fixed coding style Refer to gazebosim/gz-rendering#304 (comment) for discussion Affects ign-rendering#304
This fix depends on a fix in ign-rendering module, because it depends on the new Camera::SwapFromThread function Without it, compilation will fail Affects gazebosim/gz-rendering#304 Signed-off-by: Matias N. Goldberg <dark_sylinc@yahoo.com.ar>
This avoids us to break ign-rendering ABI while also simplifying the amount of work to be done Serializing work is easier to maintain and debug Only CPU-bound scenario would potentially benefit from parallel command generation (in terms of UI responsiveness) Parallel command generation can be added back later Also fixed coding style Refer to gazebosim/gz-rendering#304 (comment) for discussion Affects ign-rendering#304
This fix depends on a fix in ign-rendering module, because it depends on the new Camera::SwapFromThread function Without it, compilation will fail Affects gazebosim/gz-rendering#304 Signed-off-by: Matias N. Goldberg <dark_sylinc@yahoo.com.ar>
This avoids us to break ign-rendering ABI while also simplifying the amount of work to be done Serializing work is easier to maintain and debug Only CPU-bound scenario would potentially benefit from parallel command generation (in terms of UI responsiveness) Parallel command generation can be added back later Also fixed coding style Refer to gazebosim/gz-rendering#304 (comment) for discussion Affects ign-rendering#304 Signed-off-by: Matias N. Goldberg <dark_sylinc@yahoo.com.ar>
This fix depends on a fix in ign-rendering module, because it depends on the new Camera::SwapFromThread function Without it, compilation will fail Affects gazebosim/gz-rendering#304 Signed-off-by: Matias N. Goldberg <dark_sylinc@yahoo.com.ar>
This avoids us to break ign-rendering ABI while also simplifying the amount of work to be done Serializing work is easier to maintain and debug Only CPU-bound scenario would potentially benefit from parallel command generation (in terms of UI responsiveness) Parallel command generation can be added back later Also fixed coding style Refer to gazebosim/gz-rendering#304 (comment) for discussion Affects ign-rendering#304 Signed-off-by: Matias N. Goldberg <dark_sylinc@yahoo.com.ar>
* Fix race condition when rendering the UI This fix depends on a fix in ign-rendering module, because it depends on the new Camera::SwapFromThread function Without it, compilation will fail Affects gazebosim/gz-rendering#304 Signed-off-by: Matias N. Goldberg <dark_sylinc@yahoo.com.ar> * Force serialization of render commands This avoids us to break ign-rendering ABI while also simplifying the amount of work to be done Serializing work is easier to maintain and debug Only CPU-bound scenario would potentially benefit from parallel command generation (in terms of UI responsiveness) Parallel command generation can be added back later Also fixed coding style Refer to gazebosim/gz-rendering#304 (comment) for discussion Affects ign-rendering#304 Signed-off-by: Matias N. Goldberg <dark_sylinc@yahoo.com.ar> * Fix deadlock on initialization & shutdown Also fixes preexisting race condition when shutting down and improper uninitialization of the worker thread Signed-off-by: Matias N. Goldberg <dark_sylinc@yahoo.com.ar> * Fix deadlock when using MoveTo modifier Fix coding style Signed-off-by: Matias N. Goldberg <dark_sylinc@yahoo.com.ar> * Reimplemented synchronization mechanism to fix deadlocks It's a good thing we went for serializing rendering. THe way Qt implements the double buffer scheme using signals & slots is fundamentally flawed because it assumes the worker thread never needs to synchronize (e.g. to invalidate FBOs if window resolution changes). Trying to synchronize can easily cause deadlocks if Qt thread has spurious updates which don't end up emitting TextureInUse, as the worker thread is running slower than Qt thread. A way to fix this could be to use a different synchronization mechanism where the main thread increases a request counter and the worker thread is constantly looping but only wakes up when that counter is > 0. For now, this will do. Signed-off-by: Matias N. Goldberg <dark_sylinc@yahoo.com.ar> * Document RenderSync Turn some pointers into references Signed-off-by: Matias N. Goldberg <dark_sylinc@yahoo.com.ar> * Document missing parameter Signed-off-by: Matias N. Goldberg <dark_sylinc@yahoo.com.ar> * Document better our use of emit TextureInUse Signed-off-by: Matias N. Goldberg <dark_sylinc@yahoo.com.ar> * Change Q_SLOTS to slots for consistency I changed it because it didn't work, but it must've been old code because it works now. Signed-off-by: Matias N. Goldberg <dark_sylinc@yahoo.com.ar> Co-authored-by: Ian Chen <ichen@osrfoundation.org>
ign-rendering uses the same pattern as provided textureinthread sample from Qt. (the code can be seen in repo)
What should happen
The pattern is the following (ideal, i.e. what should happen):
This is basically a double-buffer scheme maintained by hand.
It is even described in Qt's sample (emphasis mine):
What is actually happening
This is causing a race condition because A is both being rendered to while also being used internally as a texture by Qt. This can cause all sorts of weird behaviors or maybe even crashes when Qt UI is used.
Most likely how bad this bug is depends on which driver is being used (e.g. Mesa radeonsi vs Mesa Intel, vs Windows Intel, vs proprietary AMD, vs NVIDIA proprietary, vs Nouveau) and the workload balance and processing speed of the CPU (like all race conditions)
Why it went unnoticed
This bug went unnoticed because Gazebo defaults to using 8x MSAA (but due to bugs in setup, MSAA is not actually used, that's another problem) and the FBO passed to Qt is the resolved one. Due to this, the window for the race condition is extremely small (because FBO is only written to during the resolve operation, which is only a fraction of the whole render).
However once I disable MSAA the problem becomes much more apparent, notice the background flicker:
2021-04-12.12-48-45.mkv.mp4
Environment
Source. Probably all branches where Ogre2 is used
Is Ogre1 affected?
I don't know. I did not evaluate. The possibility is high, given that part of the buggy behavior comes from ign-gui, which is agnostic to both Ogre1 and Ogre2
Is OptiX affected?
Unknown.
Tasks
Add "debug mode" for blocking (serializing) Qt and Ogre2 threads for decent RenderDoc debugging (otherwise API calls become interleaved and is extremely confusing)- We're now forcing serialization.Affected files
Most of the bugs affect the files coming from:
Other
I will be the one fixing this bug. This ticket is for tracking progress
The text was updated successfully, but these errors were encountered: