-
Notifications
You must be signed in to change notification settings - Fork 947
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Remove Locking From Hot Paths #2710
Comments
That identifies the problem better. See my test case for this here: https://github.com/John-Nagle/render-bench |
It's an ongoing problem. The overall effect is that WGPU is S-L-O-W for anything which loads its content while rendering. |
Discussion on Reddit: https://www.reddit.com/r/rust_gamedev/comments/11b0brr/were_not_really_game_yet/ If someone needs render-bench updated to use later versions of Rend3 and WGPU, please let me know. That's a good test for this problem. |
@cwfitzgerald if someone would like to take this is it suggested to start from your work or to restart from scratch (you mentioned that you had new ideas, maybe it could be worth to write down the intentions)? PS: this is how I tackle the same issue in my engine, with Arc<RwLock> on single Resource and keeping a simple Storage. |
Nice. I'm glad to see someone interested in tackling this. |
@gents83 I'm glad to hear other people are too! So a bit of a brain dump with relation to intentions, history, and the previous PR. HistoryThis problem has been been around (and something we're aware of) since the very beginning days of wgpu. It is complicated in part because it involves very core structures in wgpu. It is also complicated due to our users use cases:
The first manifested solution to this was the "arcinization". This converted all resources to Arc. This effort died due to the author of the PR disappearing and not publishing their changes in any really salvageable way. #2272 was what we could find of it. The second manifested solution was a rewrite of the storages in #2791. This replaced the storage container with overly-clever lock-free wait-free-access datastructure. This PR actually worked (mostly), but the core datastructure was deeply unsafe and a lot of stuff went on my my personal life at the same time, so it fell notably behind to the point it basically needed rewriting. What to DoI've been thinking about this and I think my resistance to arcinization (hence my solution with the fancy datastructure) is unfounded. I was very worried about cache locality, but we don't do things like "iterate over every texture" in any meaningful way. Because of this worry, which I believe to be unfounded, and the many downsides to such a fancy datastructure, I now thing that arcinization is the way to go. As such, instead of having There are many problems to face though. We rely on the global locks we have to provide the ability to mutate stuff. Most of our internal datastructures are immutable once created, with some exceptions:
Additionally, the trackers use the current Warning: Scope CreepWhen working on this problem it is very very tempting to try to roll other changes into the same PR - both PRs suffered from this, so I just wanted to extend a warning to resist the urge to solve even more problems in the first PR. EpilogueThis is necessarily a big task with widespread ramifications, and as such I strongly encourage you to keep in close contact with us on matrix so we can discuss direction and make sure you stay unblocked about codebase knowledge. I'm also sure this post is missing a ton of information you need to do this, so this will be a good vector for this. |
It's encouraging to see this. From my perspective, safety is important. I'm writing a metaverse client, which is a browser for virtual worlds. Like a web browser, it loads content created by thousands of different people. Some of that content is malformed, and some is hostile. I would encourage keeping this simple and safe. Even if costs some CPU time. If we have solid concurrency, we can get performance back through parallelism. Debugging race conditions and leaks in storage management is difficult for all concerned. The whole point of Rust is to avoid that class of problem. |
@cwfitzgerald I had finally the time to look a bit to the code. At a first look, in order to try to not change too much at once, what can be done is:
Then we've to manage locks and we can:
Sounds correct? Or your plan is to remove completely all IDs, Epoch, etc immediately? And - other question - since CommandBuffers are completely mutable Arcanazing them would make them unable to do move operation like into_baked(). How did you imagine to handle this case? |
Now that I'm thinking about it, storage needs to be
Tbh I'm not positive what lifeguards are for. But the later sounds right. Do note that if there isn't a significant Read/Write dichotomy, or the critical sections are very short, prefer a regular mutex.
Command buffers are a mess in general - I believe they're removed from the storages while being baked - so |
@cwfitzgerald I didn't get if you want to get rid of lifeguards or not :) |
@cwfitzgerald & @kvark in https://github.com/gents83/wgpu you find the first rough iteration of Arcanizing In this first part I just:
I would like to do as next step:
|
When there's a branch of Rend3 that uses this I'd like to test it. |
@gents83 alright, took a look. By and large this looks good, all the Mutexes are in the right place. We might consider adjusting them, but nothing concerning there. You did, however, Arc a different thing than I expected. The construction in those commits is |
@cwfitzgerald I was sort of forced to do both actually 😂 the Arcwgpu::core::Texture in order to get stuff from Storage and lock It only when needed and instead the Arc<A:: Texture> because those are moved sometime before that the actual Resource is being released/is still alive and not yet unregistered from Storage itself |
Encouraged by all this effort. My own experience in my metaverse viewer is that I can get 60 FPS and can download content at 200Mb/s, but not both at once. If the content loader and decompresser, which is running in multiple background threads, is allowed to run at high speed, the process of putting content into the GPU stalls out the main thread and drops the frame rate to 20FPS. Worst case, 12 FPS. So I have a live use case for this. |
@John-Nagle it would be nice if you could give a try to https://github.com/gents83/wgpu to start testing Arcanization it in your use case scenario 👍 |
I'm using WGPU via Rend3. I need a Rend3 that uses the new version. My test program is here: https://github.com/John-Nagle/render-bench Did https://github.com/gents83/wgpu get merged back into main wgpu? |
Nope, I didn't have had time to fully close it yet and I have to debug still some tests where stuff is not correctly released on closure due to a contention lock. |
OK, good to know. I'm looking forward to trying this in Rend3. If the system can sustain 60 FPS while content loading is in progress, we finally have a solution for big worlds that won't fit in the GPU all at once. |
Is your feature request related to a problem? Please describe.
This formalizes #2272 into a issue.
Our current system takes global locks per resource. This means that if one method needs mutable access to a particular type, it needs to lock all resources of that type as mutable. Because every method needs read or write access to some resource, it's very easy to write multithreaded code that is essentially serialized.
The worst case of this is renderpass encoding. In an ideal world, this would be fully parallelizable, but currently, because of us needing a write lock, is serial.
This is one of the ways that wgpu has major disadvantages over the underlying api.
Describe the solution you'd like
We should switch to a wait free datastructure on the hot path with per-object locking or more fine grained locking.
Describe alternatives you've considered
The arcinization was another alternative, but we have shifted away from that style;
The text was updated successfully, but these errors were encountered: