Prevent deadlock in sensors system #2025
Merged
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
🦟 Bug fix
Summary
The threading and lock mechanism in sensors system can potentially run into deadlock, especially after changes in #1938.
This deadlock is found by users of gz-sim that use a custom sim update loop. I haven't been able to reproduce it in gz-sim yet but I was able to reproduce it with the custom update loop when the system in under high load.
More details on the deadlock:
There are 2 threads in the sensors system that use a condition variable to unblock each other in order to do a soft lockstep between physics and sensors. Deadlock occurs when both threads are waiting on each other:
In my testing, I noticed that the main thread's notify_one call does not always unblock the rendering thread's wait immediately. So if the main thread continues to the next iteration and waits again before the rendering thread wakes up, deadlock occurs.
The workaround is to add a timeout to the wait call in the rendering thread so that it does not wait forever. The side effect of this change is that the rendering thread is now semi-polling for updates. If there are sensors that need update, there shouldn't be any noticeable change. However, if no sensors need to be updated, we'll see that the rendering thread would wake up every second but does nothing and goes back to waiting again.
Checklist
codecheck
passed (See contributing)Note to maintainers: Remember to use Squash-Merge and edit the commit message to match the pull request summary while retaining
Signed-off-by
messages.