You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
This is in particular a problem for bindless workflows. Compute passes typically have to emit a lot of barriers between dispatch calls in order to make sure that reads/writes from one dispatch don't affect the next if the same resource may be used.
Timings for a issuing 1000 dispatch with 6000 resources bound once:
Computepass: Bindless/1000 dispatch
time: [139.48 ms 140.17 ms 141.25 ms]
thrpt: [7.0795 Kelem/s 7.1343 Kelem/s 7.1692 Kelem/s]
Found 13 outliers among 100 measurements (13.00%)
5 (5.00%) high mild
8 (8.00%) high severe
For comparison on the same machine issuing 10x dispatches, each using 6 resources, binding before each dispatch:
Computepass: Single Threaded/1 computepasses x 10000 dispatches (Computepass Time)
time: [19.353 ms 19.565 ms 19.792 ms]
thrpt: [505.25 Kelem/s 511.11 Kelem/s 516.72 Kelem/s]
The bindless version is necessarily slower since it has to emit a lot more barriers speculatively and there's no way around it really. But it would be surprising if we couldn't do a lot better.
On the same machine the corresponding resnderpass test runs with more than 10x the resources & draw calls 10x faster (resulting in a 100x throughput of draw calls):
Renderpass: Bindless/10000 draws
time: [11.720 ms 11.820 ms 11.921 ms]
thrpt: [838.88 Kelem/s 846.04 Kelem/s 853.28 Kelem/s]
(there's only write-only resources involved here so the comparision isn't quite accurate.
The text was updated successfully, but these errors were encountered:
This is in particular a problem for bindless workflows. Compute passes typically have to emit a lot of barriers between dispatch calls in order to make sure that reads/writes from one dispatch don't affect the next if the same resource may be used.
Timings for a issuing 1000 dispatch with 6000 resources bound once:
For comparison on the same machine issuing 10x dispatches, each using 6 resources, binding before each dispatch:
The bindless version is necessarily slower since it has to emit a lot more barriers speculatively and there's no way around it really. But it would be surprising if we couldn't do a lot better.
On the same machine the corresponding resnderpass test runs with more than 10x the resources & draw calls 10x faster (resulting in a 100x throughput of draw calls):
(there's only write-only resources involved here so the comparision isn't quite accurate.
The text was updated successfully, but these errors were encountered: