Remove Locking From Hot Paths #2710

cwfitzgerald · 2022-06-02T00:48:22Z

Is your feature request related to a problem? Please describe.

This formalizes #2272 into a issue.

Our current system takes global locks per resource. This means that if one method needs mutable access to a particular type, it needs to lock all resources of that type as mutable. Because every method needs read or write access to some resource, it's very easy to write multithreaded code that is essentially serialized.

The worst case of this is renderpass encoding. In an ideal world, this would be fully parallelizable, but currently, because of us needing a write lock, is serial.

This is one of the ways that wgpu has major disadvantages over the underlying api.

Describe the solution you'd like

We should switch to a wait free datastructure on the hot path with per-object locking or more fine grained locking.

Describe alternatives you've considered
The arcinization was another alternative, but we have shifted away from that style;

Imberflur · 2022-06-02T18:04:27Z

related issues #1485, #2023, #2394, #2395

John-Nagle · 2022-06-02T18:46:19Z

That identifies the problem better. See my test case for this here: https://github.com/John-Nagle/render-bench

John-Nagle · 2023-02-22T21:53:11Z

It's an ongoing problem. The overall effect is that WGPU is S-L-O-W for anything which loads its content while rendering.
A major claimed advantage of Vulkan over OpenGL is that you can have one CPU loading content into the GPU while another manages the rendering. WGPU loses that advantage.

John-Nagle · 2023-02-25T20:58:52Z

Discussion on Reddit: https://www.reddit.com/r/rust_gamedev/comments/11b0brr/were_not_really_game_yet/

If someone needs render-bench updated to use later versions of Rend3 and WGPU, please let me know. That's a good test for this problem.

gents83 · 2023-02-26T08:36:01Z

@cwfitzgerald if someone would like to take this is it suggested to start from your work or to restart from scratch (you mentioned that you had new ideas, maybe it could be worth to write down the intentions)?
I could try to take this, but I didn't want to do something in a way that is not what you thought 😁

PS: this is how I tackle the same issue in my engine, with Arc<RwLock> on single Resource and keeping a simple Storage.

John-Nagle · 2023-02-26T22:02:25Z

Nice. I'm glad to see someone interested in tackling this.

cwfitzgerald · 2023-02-27T06:12:33Z

@gents83 I'm glad to hear other people are too! So a bit of a brain dump with relation to intentions, history, and the previous PR.

History

This problem has been been around (and something we're aware of) since the very beginning days of wgpu. It is complicated in part because it involves very core structures in wgpu. It is also complicated due to our users use cases:

Firefox has very steep safety requirements due to the potentially massive attack surface. They will take performance losses in exchange for increased safety.
All native apps are generally the opposite. They want performance above all else and as long as the code works.

The first manifested solution to this was the "arcinization". This converted all resources to Arc. This effort died due to the author of the PR disappearing and not publishing their changes in any really salvageable way. #2272 was what we could find of it.

The second manifested solution was a rewrite of the storages in #2791. This replaced the storage container with overly-clever lock-free wait-free-access datastructure. This PR actually worked (mostly), but the core datastructure was deeply unsafe and a lot of stuff went on my my personal life at the same time, so it fell notably behind to the point it basically needed rewriting.

What to Do

I've been thinking about this and I think my resistance to arcinization (hence my solution with the fancy datastructure) is unfounded. I was very worried about cache locality, but we don't do things like "iterate over every texture" in any meaningful way. Because of this worry, which I believe to be unfounded, and the many downsides to such a fancy datastructure, I now thing that arcinization is the way to go.

As such, instead of having IDs be a index, epoch, and backend all packed into a u64, we should have them be an Arc<T>. In short: we should continue arcinizing.

There are many problems to face though. We rely on the global locks we have to provide the ability to mutate stuff. Most of our internal datastructures are immutable once created, with some exceptions:

Device has multiple locks inside it.
Buffers and Textures have their initialization state tracked mutably. (This will need to go behind a lock)
Command buffers are completely mutable and need a lock around them.

Additionally, the trackers use the current index to store state about the resources in a dense array - relying on the fact that the index is allocated using a freelist to keep the numbers dense.. Getting rid of this index would pose significant problems. As such, I think we need to preserve the index, but only use it for this kind of dense datastructure.

Warning: Scope Creep

When working on this problem it is very very tempting to try to roll other changes into the same PR - both PRs suffered from this, so I just wanted to extend a warning to resist the urge to solve even more problems in the first PR.

Epilogue

This is necessarily a big task with widespread ramifications, and as such I strongly encourage you to keep in close contact with us on matrix so we can discuss direction and make sure you stay unblocked about codebase knowledge. I'm also sure this post is missing a ton of information you need to do this, so this will be a good vector for this.

John-Nagle · 2023-02-27T06:25:14Z

It's encouraging to see this. From my perspective, safety is important. I'm writing a metaverse client, which is a browser for virtual worlds. Like a web browser, it loads content created by thousands of different people. Some of that content is malformed, and some is hostile. I would encourage keeping this simple and safe. Even if costs some CPU time. If we have solid concurrency, we can get performance back through parallelism.

Debugging race conditions and leaks in storage management is difficult for all concerned. The whole point of Rust is to avoid that class of problem.

gents83 · 2023-03-04T10:35:09Z

@cwfitzgerald I had finally the time to look a bit to the code.

At a first look, in order to try to not change too much at once, what can be done is:

Keep indexes
Remove the RwLock from the Storage
Having Storage holding Arc

Then we've to manage locks and we can:

Use lifeguard\token mechanism (not on the Storage but on T itself this time) or
Use RwLock around what needs locking

Sounds correct? Or your plan is to remove completely all IDs, Epoch, etc immediately?

And - other question - since CommandBuffers are completely mutable Arcanazing them would make them unable to do move operation like into_baked(). How did you imagine to handle this case?

cwfitzgerald · 2023-03-06T21:41:53Z

At a first look, in order to try to not change too much at once, what can be done is:
- Keep indexes
- Remove the RwLock from the Storage
- Having Storage holding Arc

Now that I'm thinking about it, storage needs to be Mutex<Vec<Arc<T>>>. When we need to access a given index T, we need to lock the mutex, but only long enough to clone the arc. Then we can release the mutex and let everyone else through. Basically we only need to lock while the storage itself is being actively changed. There's still critical sections, but they're on the order of cycles as opposed to milliseconds.

Then we've to manage locks and we can:
- Use lifeguard\token mechanism (not on the Storage but on T itself this time) or
- Use RwLock around what needs locking

Tbh I'm not positive what lifeguards are for. But the later sounds right. Do note that if there isn't a significant Read/Write dichotomy, or the critical sections are very short, prefer a regular mutex.

And - other question - since CommandBuffers are completely mutable Arcanazing them would make them unable to do move operation like into_baked(). How did you imagine to handle this case?

Command buffers are a mess in general - I believe they're removed from the storages while being baked - so Arc::try_unwrap should succeed.

gents83 · 2023-03-10T14:22:30Z

@cwfitzgerald I didn't get if you want to get rid of lifeguards or not :)
Since we are using Arc there are some checks that will be automatically tracked by the Arc itself and that will not allow the Arc::try_unwrap() or the resource destroy.
Here an example where I do not find it useful:

but I want to get yout POV on this topic in general

gents83 · 2023-03-13T19:02:24Z

@cwfitzgerald & @kvark in https://github.com/gents83/wgpu you find the first rough iteration of Arcanizing
(mainly everything in 1 CL - except for a lock-fix)

In this first part I just:

Added Arc and Mutex\RwLock where needed
Used Arc::try_unwrap() and Mutex::into_inner() to release stuff that should be released and moved around.
Removed Token
I didn't do yet any optimization or removed any global "array guard" that are used everywhere locking storages more than needed.
Doing this first step is just to verify with you if the direction is anyway correct or if you've anything different in mind :)

I would like to do as next step:

Listen to your feedbacks\suggestions\changes needed
Access storages only to get the Arc and then allow others to access it (no more guards if not needed to lock the content for the entire function)
Understand if we still need lifeguard and refcount or if we can get rid of them, now that we're using Arc
Testing, re-testing, stress-testing
Profile
Go through the @John-Nagle case scenario

John-Nagle · 2023-03-13T19:08:41Z

When there's a branch of Rend3 that uses this I'd like to test it.

cwfitzgerald · 2023-03-15T23:33:34Z

@gents83 alright, took a look.

By and large this looks good, all the Mutexes are in the right place. We might consider adjusting them, but nothing concerning there. You did, however, Arc a different thing than I expected. The construction in those commits is Arc<A::Texture> inside wgpu_core::Texture - what I was expecting was Arc<wgpu_core::Texture>. The main reason for this was to facilitate locking the storages for as short of time as possible. Most of the operations we do are on the wgpu_core type, as we often need the entirety of the structure.

gents83 · 2023-03-16T06:31:21Z

@cwfitzgerald I was sort of forced to do both actually 😂 the Arcwgpu::core::Texture in order to get stuff from Storage and lock It only when needed and instead the Arc<A:: Texture> because those are moved sometime before that the actual Resource is being released/is still alive and not yet unregistered from Storage itself
If you're ok I can anyway continue it, trying to clean it up and optimizing it and then stress testing it, or do you want me to do some core changes?

John-Nagle · 2023-04-15T05:01:24Z

Encouraged by all this effort.

My own experience in my metaverse viewer is that I can get 60 FPS and can download content at 200Mb/s, but not both at once. If the content loader and decompresser, which is running in multiple background threads, is allowed to run at high speed, the process of putting content into the GPU stalls out the main thread and drops the frame rate to 20FPS. Worst case, 12 FPS. So I have a live use case for this.

gents83 · 2023-04-15T09:43:54Z

@John-Nagle it would be nice if you could give a try to https://github.com/gents83/wgpu to start testing Arcanization it in your use case scenario 👍

John-Nagle · 2023-04-21T22:59:01Z

@John-Nagle it would be nice if you could give a try to https://github.com/gents83/wgpu to start testing Arcanization it in your use case scenario +1

I'm using WGPU via Rend3. I need a Rend3 that uses the new version.

My test program is here: https://github.com/John-Nagle/render-bench
It requires Rend3.

Did https://github.com/gents83/wgpu get merged back into main wgpu?

gents83 · 2023-04-22T05:42:35Z

Did https://github.com/gents83/wgpu get merged back into main wgpu?

Nope, I didn't have had time to fully close it yet and I have to debug still some tests where stuff is not correctly released on closure due to a contention lock.
Hopefully I will continue on this starting from mid/end next week

John-Nagle · 2023-04-22T05:46:32Z

OK, good to know. I'm looking forward to trying this in Rend3. If the system can sustain 60 FPS while content loading is in progress, we finally have a solution for big worlds that won't fit in the GPU all at once.

cwfitzgerald added type: enhancement New feature or request area: performance How fast things go labels Jun 2, 2022

cwfitzgerald added this to the Release 0.13 milestone Jun 2, 2022

cwfitzgerald mentioned this issue Jun 2, 2022

Fine-grained locking #444

Closed

cwfitzgerald mentioned this issue Jun 2, 2022

[WIP] Arcanization #2272

Closed

cwfitzgerald mentioned this issue Jun 5, 2022

Lifetimes on RenderPass make it difficult to use. #1453

Closed

cwfitzgerald mentioned this issue Jun 20, 2022

Storage Rewrite #2791

Closed

4 tasks

cwfitzgerald removed this from the Release 0.13 milestone Jun 21, 2022

cwfitzgerald added this to the Release 0.14 milestone Aug 4, 2022

cwfitzgerald modified the milestones: Release 0.14, Release 0.15 Oct 5, 2022

cwfitzgerald removed this from the Release 0.15 milestone Dec 26, 2022

cwfitzgerald mentioned this issue Jan 4, 2023

added texture metadata #3346

Merged

cwfitzgerald mentioned this issue Feb 22, 2023

Notes on API concurrency BVE-Reborn/rend3#418

Closed

gents83 mentioned this issue Mar 20, 2023

Dynamic dispatch layer breaks on web #3430

Closed

JMS55 mentioned this issue Mar 26, 2023

Parallel pipeline compliation bevyengine/bevy#8224

Closed

gents83 mentioned this issue Mar 30, 2023

Arcanization of wgpu core resources #3626

Merged

nical closed this as completed in #3626 Nov 20, 2023

cwfitzgerald mentioned this issue Dec 11, 2023

Queue.write_texture waits too long when called from a separate thread #4859

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Remove Locking From Hot Paths #2710

Remove Locking From Hot Paths #2710

cwfitzgerald commented Jun 2, 2022

Imberflur commented Jun 2, 2022

John-Nagle commented Jun 2, 2022

John-Nagle commented Feb 22, 2023

John-Nagle commented Feb 25, 2023

gents83 commented Feb 26, 2023

John-Nagle commented Feb 26, 2023

cwfitzgerald commented Feb 27, 2023 •

edited

Loading

John-Nagle commented Feb 27, 2023 •

edited

Loading

gents83 commented Mar 4, 2023 •

edited

Loading

cwfitzgerald commented Mar 6, 2023

gents83 commented Mar 10, 2023

gents83 commented Mar 13, 2023 •

edited

Loading

John-Nagle commented Mar 13, 2023

cwfitzgerald commented Mar 15, 2023

gents83 commented Mar 16, 2023

John-Nagle commented Apr 15, 2023

gents83 commented Apr 15, 2023

John-Nagle commented Apr 21, 2023

gents83 commented Apr 22, 2023

John-Nagle commented Apr 22, 2023

Remove Locking From Hot Paths #2710

Remove Locking From Hot Paths #2710

Comments

cwfitzgerald commented Jun 2, 2022

Imberflur commented Jun 2, 2022

John-Nagle commented Jun 2, 2022

John-Nagle commented Feb 22, 2023

John-Nagle commented Feb 25, 2023

gents83 commented Feb 26, 2023

John-Nagle commented Feb 26, 2023

cwfitzgerald commented Feb 27, 2023 • edited Loading

History

What to Do

Warning: Scope Creep

Epilogue

John-Nagle commented Feb 27, 2023 • edited Loading

gents83 commented Mar 4, 2023 • edited Loading

cwfitzgerald commented Mar 6, 2023

gents83 commented Mar 10, 2023

gents83 commented Mar 13, 2023 • edited Loading

John-Nagle commented Mar 13, 2023

cwfitzgerald commented Mar 15, 2023

gents83 commented Mar 16, 2023

John-Nagle commented Apr 15, 2023

gents83 commented Apr 15, 2023

John-Nagle commented Apr 21, 2023

gents83 commented Apr 22, 2023

John-Nagle commented Apr 22, 2023

cwfitzgerald commented Feb 27, 2023 •

edited

Loading

John-Nagle commented Feb 27, 2023 •

edited

Loading

gents83 commented Mar 4, 2023 •

edited

Loading

gents83 commented Mar 13, 2023 •

edited

Loading