Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Avoid multiple possibilites of deadlock in resource loading #77143

Merged
merged 1 commit into from
May 17, 2023

Conversation

RandomShaper
Copy link
Member

  • Multi-threaded resource loading can deadlock if one of the current load tasks are waiting for another which hasn't still started running to complete and all the low-priority slots in the WorkerThreadPool are occupied. This PR allows the WorkerThreadPool to detect such a situation and schedule additional low-prio tasks until the situation is solved.

    Since the WorkerThreadPool needs to count how many low-prio threads are awaiting on others, this PR also needed to add support for multiple waiters. Otherwise, the ResourceLoader would use its own waiting mechanism for tasks already awaited, which would prevent such count from being exhaustive.

  • Another possibility of deadlock is that a worker thread in the WorkerThreadPool, upon one task waiting for another, runs yet another task over the stack frame of the awaiting one. That can lead to a load task eventually waiting for a previous one which can no longer finish because it's "buried" in the stack. Many possibilites have been assessed and discarded (enrich the worker pool with some task dependency info, avoid burying tasks' stack frames and even adding support for fibers for easy task switching within a given thread). The approach here is let the WorkerThreadPool detect the fact and report ERR_BUSY to the caller. In this PR the ResourceLoader is trained to catch the problem and apply a workaround.

  • This PR also ensures there's always at least one high-priority thread in the pool.

@RandomShaper RandomShaper added this to the 4.1 milestone May 16, 2023
@RandomShaper RandomShaper requested a review from reduz May 16, 2023 22:37
@RandomShaper RandomShaper requested a review from a team as a code owner May 16, 2023 22:37
@RandomShaper RandomShaper requested a review from a team as a code owner May 16, 2023 23:52
@@ -100,10 +100,13 @@
</description>
</method>
<method name="wait_for_task_completion">
<return type="void" />
<return type="int" enum="Error" />
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Technically this breaks compatibility. Needs to be assessed whether it's worth making an exception and allow it.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can change it so the exposed API doesn't change. However, I don't really see the downside. If the caller is not interested, they can just ignore the returned value. After all, there's no behavior change. This gives an opportunity to the user of doing something about the deadlock, but the possibility of it occuring was already there.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think its fine, It does not change anything for GDScript and for GDExtension we already have the compatibility fallback system.

@akien-mga akien-mga merged commit 26f96ae into godotengine:master May 17, 2023
@akien-mga
Copy link
Member

Thanks!

@RandomShaper RandomShaper deleted the fix_wtp_deadlocks branch May 17, 2023 15:18
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants