-
Notifications
You must be signed in to change notification settings - Fork 947
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Bindless Tracking Issue #3637
Comments
Bindless mode means you have a big table of all the texture objects in the GPU. Each of those texture objects is a GPU buffer. Buffers either belong to the GPU, and the CPU can't touch them, or they belong to the CPU, and the GPU can't touch them. So the big table of texture object descriptors and the GPU buffer states have to be kept in sync. At the Vulkan level, this problem belongs to the program calling Vulkan. Vulkano, which is supposed to be a safe Rust interface to Vulkan, has machinery for maintaining that table. But WGPU doesn't have that machinery, so it can't do bindless yet, at least not safely. GPU/CPU buffer ownership and the bindless descriptor table need to be managed together. All the targets WGPU currently supports seem to support bindless mode. Even OpenGL, since 2013, has offered it as an extension. You do need OpenGL 4 with extensions, though. Everything in current release seems to have full support now, via Vulkan, Metal, or OpenGL. The future is bindless. Unreal Engine is now bindless-only, I think. |
It is not, in fact. Bindless resources are largely experimental and only enabled by default under Vulkan. They are optionally enabled under DX12 when using SM6 + ray tracing. There's still ongoing work to add support for bindless resources to shader graph as well. Aside from that, it seems very much like there are targets that they're interested in that do not support bindless. So many materials will likely still need to implement both paths. |
I have filled out the above issue with the current plan for bindless, and what work has been done previously |
Right. I see more of the problems now. Looking at Bindless Investigation and Proposal, it's clear that driver-controlled residency and big arrays of bindless descriptors do not play well together. This is a non-problem for Vulkan, where all assets must be resident in GPU memory. For Metal, there is a residency control API.. Not sure what the plan is for WebGPU, since that's still being defined. I look at this from the viewpoint of needing game-type performance on large scenes, for target machines comparable to what the average Steam user has. In my own applications, I'm managing residency at the application level, where I switch textures to lower resolutions when memory is tight. Rejection of a buffer allocation request is a normal event which results in LOD reductions. For the Vulkan case, this substitutes for driver initiated eviction. I've mostly looked at the Vulkan case, where VK_DESCRIPTOR_BINDING_UPDATE_AFTER_BIND_BIT is available and one big array of descriptors is possible. When everything is resident, that leads to simple implementations which don't require barriers during drawing, while allowing concurrent updating. I've written some design materials on that. The basic concept is that the wrapper level (WGPU) takes ownership of a buffer when it is in the descriptor table, which protects it from being deleted while rendering is in progress. The cram job required to run on smaller targets makes things far more complicated. Driver-initiated eviction really complicates things. Can that be handled without crippling performance on the more powerful targets? |
To clarify, none of these apis deal with residency at all . The assumption is that all resources that are bound to bind group are resident always. Yes we need to prevent metal from making resources non-resident, but with residency sets this should be easy.
Barrier generation has basically nothing to do with residency. The purpose of the |
OK, if everything is resident, things are simpler. So, as you see bindless, what has to be done on a per-draw basis? That's where the overhead comes from. Is there checking that has to be done at each draw, or can the checking be hoisted to once per frame, or once per texture change, or to compile time? |
If an allocator is needed for descriptor slots, feel free to use this one I wrote. It's lock-free. |
This would be deligated to the user, but good to have in case a user needs it |
Ah. You may not want to delegate that to the user. WGPU already has a buffer allocator, and descriptor slots and buffers need to be closely coordinated. I've been looking at designs which work something like this:
The idea is to use Rust ownership to manage most of the interlocking. If you let the application mess with the descriptor array, you need more checking in the lower layers. More machinery to implement. Bindless mode exists to improve performance by spending less time doing binding. It's only useful if it provides a big reduction in binding overhead. Comments? |
Nothing needs to be done on a per-draw basis (unless you count the usage of our indirection buffers). We designed it in a way that if the user uses the API properly (that is, marking resources as read-only when they are going to basically be read-only by the application), that there's very little cost to the validation. Assuming we don't find a flaw in our plan, that is :) For read-write bindless resources, there's some validation required that needs to be done at set_bind_group time, and some barriers might need to get emitted at submit time, but we expect that the number of read-write resources to be relatively small.
However, we discussed several ideas in the WebGPU WG, and currently believe that leaving bind group indices to the user hits a nice balance between user flexibility and performance.
Your library sounds like a great utility library that we might want to recommend to users of wgpu in Rust! Note that the bindless proposal we have provided above is perfectly safe and requires no synchronization with the GPU; shadow copies are made under the hood when updating bind groups. That said, if we think the cost of mutating bind groups through the shadow copy remains too high, we might go the other way and handle slot allocation for the user. Whatever happens, we want to make sure that wgpu remains aligned with the WebGPU WG and specification. |
Where is the new bindless API documented? Even if it's not working, I'd like to see how the API is supposed to be used. |
The two proposals (https://hackmd.io/PCwnjLyVSqmLfTRSqH0viA and https://hackmd.io/@cwfitzgerald/wgpu-bindless) should paint a somewhat complete picture of the new API, though you might have to read between the lines a bit. It's meant for implementers. |
Bind groups are updated on the CPU timeline and updates "immediately". ... What** this means that all previous uses of the bind group continue to use the old contents and any new usages use the updated contents. So the descriptor table is double-buffered? Reasonable. I was thinking in terms of an update queue applied at end of frame, but that's functionally equivalent. "The CPU timeline" concept needs to be clarified for multi-thread programs. There are potential locking bottlenecks. As noted, this is weird for multi-thread programs. Note that this means that every update_bindings call will require us to make a shadow copy of all descriptors in the bind group, and associated tracking data. While we don't expect update_bindings to require a lot of memory compared to buffers and textures, it is still not a cheap operation. Right. That's why I was thinking in terms of an update queue. Number of changes per frame is probably < 100. Number of descriptors is on the order of 100,000 for a complex scene. But either way will work. By allowing a resource to be shifted to a read-only state, we let the tracking systems only worry about the resource being alive, not their state. This allows bindless arrays to be bound with very low costs. Right. Most content is read-only. Read-write content is mostly rendering intermediates. What prevents freeing a bound buffer? Something has to interlock against that. Will that be an error or an operation that is deferred until it is safe? Is a shader accessing an unused descriptor slot a problem? It's tempting to initialize all the unused descriptor slots to point to a purple error texture. Something has to check for out of range indices in shaders, but if unused slots are harmless, there's no need to check for a valid descriptor. This can work. Thanks. |
There's still a single timeline that all threads experience - that is, you will either see before or after the update. This is the same for all multi-threaded methods in wgpu.
It's hard to avoid shadow copying something - this could be optimized to as little as 4 bytes per descriptor though depending on implementation strategy. This will need to be driven by profiling.
The same infrastructure that prevents it today. The bind group owns the texture (as in it actually just has an Arc) so as long as we keep the command encoder alive until the gpu is done using it, the texture will stay alive. This is how it happens today.
It will return unspecified (NOT undefined) results, but is valid.
Yes, we will check against a metadata buffer, see the implementation notes in the spec |
Right. All binding updates commit at that copy, which allows for more optimization vs. bind-per-draw. Sounds good.
I'd suggest mapping unused texture descriptor slots to some built-in error texture, such as the purple often used for areas where nothing was drawn. Then errors become obvious. Mapping to a null handle means nothing is drawn, which is harder to debug. |
Overview
This issue tracks enabling various "bindless" functionality for the various native backends.
For a high level guide on what we believe the bindless api should look like, look at https://hackmd.io/@cwfitzgerald/wgpu-bindless
Binding Array Support
binding_array
on Metal #3334binding_array
of Storage Buffers on Metal #6741binding_array
of Uniform Buffers on Metal #6742binding_array
of Storage Textures on Metal #6744binding_array
of Storage Buffers on DX12 #6739binding_array
of Uniform Buffers on DX12 #6740binding_array
of Storage Textures on DX12 #6743Block
#6733Partially bound descriptors
Validation
BindGroupLayoutEntry::count
Validation Does Not Check That Shader Uses Binding Arrays #3648binding_array
Limits into Separate Limit #6738GPU Validation
Sparse Bind Groups
Mutable Bind Groups
BindGroup::update_bindings
Without HolesRead Only Resources
Texture::set_usages
Buffer::set_usages
Temporary Removal
Driver Bugs(?)
binding_array
of Storage Buffer is Incorrectly Validated #6745The text was updated successfully, but these errors were encountered: