-
-
Notifications
You must be signed in to change notification settings - Fork 3.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Merged by Bors] - Fix AssetServer::get_asset_loader deadlock #2395
Conversation
A thread could take the extension_to_loader_index read lock, and then have the `server.loader` write lock taken in add_loader before it can. Then add_loader can't take the extension_to_loader_index lock, and the program deadlocks. Fixed by descoping the extension_to_loader_index lock, since get_asset_loader doesn't need to hold the lock for the duration, just to get a copiable usize. The block might not be needed, I think I could have gotten away with just inserting a `copied()` call into the chain, but I wanted to make the reasoning clear for future maintainers.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a great catch!! Nice job 😄
This issue also probably hints at us needed to run some sort of better testing for parallel code.
E.g. some sort of fuzz testing or thread-sanitizer.
bors try |
tryBuild succeeded: |
Loom can test all possible interleavings of synchronization primitives like mutexes and atomics. |
(@mockersf I'd say this PR is safe to merge) |
could you mention in #2373 if you're ok with changing the license for your contribution? |
Done! |
bors r+ |
# Objective Fixes a possible deadlock between `AssetServer::get_asset_loader` / `AssetServer::add_loader` A thread could take the `extension_to_loader_index` read lock, and then have the `server.loader` write lock taken in add_loader before it can. Then add_loader can't take the extension_to_loader_index lock, and the program deadlocks. To be more precise: ## Step 1: Thread 1 grabs the `extension_to_loader_index` lock on lines 138..139: https://github.com/bevyengine/bevy/blob/3a1867a92edc571b8f842bb1a96112dcbdceae4b/crates/bevy_asset/src/asset_server.rs#L133-L145 ## Step 2: Thread 2 grabs the `server.loader` write lock on line 107: https://github.com/bevyengine/bevy/blob/3a1867a92edc571b8f842bb1a96112dcbdceae4b/crates/bevy_asset/src/asset_server.rs#L103-L116 ## Step 3: Deadlock, since Thread 1 wants to grab `server.loader` on line 141...: https://github.com/bevyengine/bevy/blob/3a1867a92edc571b8f842bb1a96112dcbdceae4b/crates/bevy_asset/src/asset_server.rs#L133-L145 ... and Thread 2 wants to grab 'extension_to_loader_index` on lines 111..112: https://github.com/bevyengine/bevy/blob/3a1867a92edc571b8f842bb1a96112dcbdceae4b/crates/bevy_asset/src/asset_server.rs#L103-L116 ## Solution Fixed by descoping the extension_to_loader_index lock, since `get_asset_loader` doesn't need to hold the read lock on the extensions map for the duration, just to get a copyable usize. The block might not be needed, I think I could have gotten away with just inserting a `copied()` call into the chain, but I wanted to make the reasoning clear for future maintainers.
Pull request successfully merged into main. Build succeeded: |
# Objective Fixes a possible deadlock between `AssetServer::get_asset_loader` / `AssetServer::add_loader` A thread could take the `extension_to_loader_index` read lock, and then have the `server.loader` write lock taken in add_loader before it can. Then add_loader can't take the extension_to_loader_index lock, and the program deadlocks. To be more precise: ## Step 1: Thread 1 grabs the `extension_to_loader_index` lock on lines 138..139: https://github.com/bevyengine/bevy/blob/3a1867a92edc571b8f842bb1a96112dcbdceae4b/crates/bevy_asset/src/asset_server.rs#L133-L145 ## Step 2: Thread 2 grabs the `server.loader` write lock on line 107: https://github.com/bevyengine/bevy/blob/3a1867a92edc571b8f842bb1a96112dcbdceae4b/crates/bevy_asset/src/asset_server.rs#L103-L116 ## Step 3: Deadlock, since Thread 1 wants to grab `server.loader` on line 141...: https://github.com/bevyengine/bevy/blob/3a1867a92edc571b8f842bb1a96112dcbdceae4b/crates/bevy_asset/src/asset_server.rs#L133-L145 ... and Thread 2 wants to grab 'extension_to_loader_index` on lines 111..112: https://github.com/bevyengine/bevy/blob/3a1867a92edc571b8f842bb1a96112dcbdceae4b/crates/bevy_asset/src/asset_server.rs#L103-L116 ## Solution Fixed by descoping the extension_to_loader_index lock, since `get_asset_loader` doesn't need to hold the read lock on the extensions map for the duration, just to get a copyable usize. The block might not be needed, I think I could have gotten away with just inserting a `copied()` call into the chain, but I wanted to make the reasoning clear for future maintainers.
Objective
Fixes a possible deadlock between
AssetServer::get_asset_loader
/AssetServer::add_loader
A thread could take the
extension_to_loader_index
read lock,and then have the
server.loader
write lock taken in add_loaderbefore it can. Then add_loader can't take the extension_to_loader_index
lock, and the program deadlocks.
To be more precise:
Step 1: Thread 1 grabs the
extension_to_loader_index
lock on lines 138..139:bevy/crates/bevy_asset/src/asset_server.rs
Lines 133 to 145 in 3a1867a
Step 2: Thread 2 grabs the
server.loader
write lock on line 107:bevy/crates/bevy_asset/src/asset_server.rs
Lines 103 to 116 in 3a1867a
Step 3: Deadlock, since Thread 1 wants to grab
server.loader
on line 141...:bevy/crates/bevy_asset/src/asset_server.rs
Lines 133 to 145 in 3a1867a
... and Thread 2 wants to grab 'extension_to_loader_index` on lines 111..112:
bevy/crates/bevy_asset/src/asset_server.rs
Lines 103 to 116 in 3a1867a
Solution
Fixed by descoping the extension_to_loader_index lock, since
get_asset_loader
doesn't need to hold the read lock on the extensions map for the duration,just to get a copyable usize. The block might not be needed,
I think I could have gotten away with just inserting a
copied()
call into the chain, but I wanted to make the reasoning clear for
future maintainers.