Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Compute pass recording with a large number of (read/write) resources is very slow due to barrier emitting between dispatch calls #5766

Open
Wumpf opened this issue Jun 2, 2024 · 0 comments
Labels
area: performance How fast things go

Comments

@Wumpf
Copy link
Member

Wumpf commented Jun 2, 2024

This is in particular a problem for bindless workflows. Compute passes typically have to emit a lot of barriers between dispatch calls in order to make sure that reads/writes from one dispatch don't affect the next if the same resource may be used.

Timings for a issuing 1000 dispatch with 6000 resources bound once:

Computepass: Bindless/1000 dispatch
                        time:   [139.48 ms 140.17 ms 141.25 ms]
                        thrpt:  [7.0795 Kelem/s 7.1343 Kelem/s 7.1692 Kelem/s]
Found 13 outliers among 100 measurements (13.00%)
  5 (5.00%) high mild
  8 (8.00%) high severe

For comparison on the same machine issuing 10x dispatches, each using 6 resources, binding before each dispatch:

Computepass: Single Threaded/1 computepasses x 10000 dispatches (Computepass Time)
                        time:   [19.353 ms 19.565 ms 19.792 ms]
                        thrpt:  [505.25 Kelem/s 511.11 Kelem/s 516.72 Kelem/s]

The bindless version is necessarily slower since it has to emit a lot more barriers speculatively and there's no way around it really. But it would be surprising if we couldn't do a lot better.

On the same machine the corresponding resnderpass test runs with more than 10x the resources & draw calls 10x faster (resulting in a 100x throughput of draw calls):

Renderpass: Bindless/10000 draws
                        time:   [11.720 ms 11.820 ms 11.921 ms]
                        thrpt:  [838.88 Kelem/s 846.04 Kelem/s 853.28 Kelem/s]

(there's only write-only resources involved here so the comparision isn't quite accurate.

@Wumpf Wumpf added the area: performance How fast things go label Jun 2, 2024
@Wumpf Wumpf mentioned this issue Jun 2, 2024
6 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area: performance How fast things go
Projects
Status: No status
Development

No branches or pull requests

1 participant