-
Notifications
You must be signed in to change notification settings - Fork 948
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Limited parallelism due to locking #2023
Comments
They definitely should scale, as much as the underlying driver lets them. We are aware we currently lock way too heavily and there is an inflight refactor that will help with most cases, only locking exactly the resources we need, instead of the whole world. write_buffer is going to need a bit of special work to get going, but totally should be possible to call in parallel.
They actually no longer need to be, so this will change. Locking the world for the command encoder is still unnecessary though, this will be addressed in the refactor. |
This is a known problem and unfortunately isn't the easiest to solve generically. I've had some ideas here #1260 (see the "extras" section for talk about write_buffer/transfers). Transfers are probably going to be the easiest one to solve as the problem of "is disjoint" isn't terribly complicated to solve. |
I tried to understand why these barriers completely blocked other transfers even though they looked like they were fine-grained buffer-local barriers. Then I found https://www.mail-archive.com/mesa-commit@lists.freedesktop.org/msg117032.html which suggests that drivers don't really implement this part of vulkan and fall back to full memory barriers. |
Yeah, barriers are basically just "should I flush the L1/L2 cache and wait for the pipeline to drain", the hardware doesn't have fine grained control. We need to batch barriers as much as it is valid to do so. |
Obsoleted by #2710 |
I have a program that calls
queue_write_buffer
concurrently. Most of the time the threads are blocked with the following stack trace:After switching to thread-local
StagingBelt
s andCommandEncoder
s the improvements were only marginal becausein
command_encoder_copy_buffer_to_buffer
acquires a global lock. This seems unnecessary sinceCommandEncoder
s are!Send + !Sync
.Furthermore my program also calls
create_bind_group
concurrently. This also acquires a global lock in the Vulkan backend:I locally addressed these issues by
UnsafeCell
s insidehub.command_buffers
and replacing mosthub.command_buffers.write
calls byread
calls.Together these changes reduced the runtime of the encoding step by 50%.
Are using
queue.write_buffer
anddevice.create_bind_group
concurrently supposed to scale or is this a pattern that should be avoided?The text was updated successfully, but these errors were encountered: