Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MultiDrawIndirectCount and MultiDrawIndexedIndirectAccount appear to have bugs. #429

Closed
fyellin opened this issue Sep 23, 2024 · 3 comments

Comments

@fyellin
Copy link
Contributor

fyellin commented Sep 23, 2024

I'm attempting to implement these two draw commands for pygfx/wgpu-py and it seems that there are bugs in their implementation. I have tried out my code on both a Vulkan emulator and on an Linux box with an actual GPU chip and get similar problems.

The arguments to both of these calls are:
renderPassEncoder, buffer, offset, count_buffer, count_buffer_offset, max_count

The two issues I have discovered are:

  1. Both opcodes ignore the contents of count_buffer (but they do check that its length is correct). The actually value used for determining the "count" is the u32 at count_buffer_offset of buffer.

  2. For the non-indexed draw command, the buffer is supposed to be parsed into groups of 4 u32s with a span of 16, but it appears to be actually sets of 5, which a span of 20.. When I run the code on an emulator, each quintuple [a, b, c, d, e] is interpreted as draw arguments [a, b, c, d]. When I run the code on an actual GPU, it is interpreted as [1, b, d, e].

My test code is simple, though like all CPU code, it's a bit long. A have a shader that does nothing but keeps track of every (vertex_index, instance_index) that it sees, writes it to a buffer, and nothing else. I then collect the contents of the buffer.

The documentation states that each of these is equivalent for reasonable values. In particular, these are defined by the vertex/instance indices that they use, and my code should give the same result for all of them.

  • `pass.draw(a, b, c, d); pass.draw(e, f, g, h);
    
  • `pass.drawIndirect(buffer, 0); pass.drawIndirect(buffer, 16)`. where `buffer` has `(a, b, c, d, e, f, g, h)` as u32s
    
  • `pass.drawIndirect(buffer, 8); pass.drawIndirect(buffer, 24)`  where `buffer` has `(?, ?, a, b, c, d, e, f, g, h)` as u32s
    
  • `pass.multiDrawIndirect(buffer, 8, 2)`. where buffer is as previous.
    
  • `pass.multiDrawIndirectCount(buffer, 8, count_buffer, 0, 2)` where buffer is as previous and count contains a u32 larger then 2.
    

In particular, the values (a, b, c, d) should yield a*b tuple pairs, with the values being range(c, c + a) X range(d, d + b) where X is the cross product.

The first four always gave the same result. It was by trial and error that I discovered that the count_buffer was being ignored and that the count was coming from the buffer. Since these u32s were originally set to 0, my vertex shader wasn't being called! But things would work perfectly if I set the first word of the buffer to count, and just set count to 0. Likewise, by looking at the vertex/index instance pairs, I could figure out what the results were.

The code is attached.

Here is the output: In each case, DrawIndirect is being called with max_count as its count. I have include the results both of running this code on an actual GPU and of running this code on github with LavaPipe.

data=[0, 0, 1, 2, 3, 4, 5, 6, 7, 8], offset=8, count_data=[1], count_buffer_offset=0, max_count=1
DrawIndirect:      [[3, 4], [3, 5]]
DrawIndirectCount: [] # real GPU
DrawIndirectCount: [] # github LavaPipe emulator

Since buffer[0] == 0, DrawIndirectCount isn't doing anything, though it should. When I run this on the emulator, I also get

data=[1, 0, 1, 2, 3, 4, 5, 6, 7, 8], offset=8, count_data=[0], count_buffer_offset=0, max_count=1
DrawIndirect:      [[3, 4], [3, 5]]
DrawIndirectCount: [[4, 5], [4, 6]] # real GPU
DrawIndirectCount: [[3, 4], [3, 5]] # github LavaPipe emulator

Now they're both giving result, even though the count is in the wrong place. DrawIndirectCount is seeing the arguments [1, 2, 4, 5] rather than [1, 2, 3, 4]. The emulator is giving the correct result.

data=[2, 0, 1, 2, 3, 4, 2, 3, 4, 5, 3, 4, 5, 6], offset=8, count_data=[0], count_buffer_offset=0, max_count=2
DrawIndirect:      [[3, 4], [3, 5], [4, 5], [4, 6], [4, 7], [5, 5], [5, 6], [5, 7]]
DrawIndirectCount: [[3, 4], [3, 5], [3, 6], [3, 7], [4, 2], [4, 3]] # gpu
DrawIndirectCount: [[3, 4], [3, 5], [5, 3], [5, 4], [5, 5], [5, 6], [6, 3], [6, 4], [6, 5], [6, 6], [7, 3], [7, 4], [7, 5], [7, 6]] #emulator

DrawIndirect is giving the correct results. (1, 2, 3, 4) yields the pairs (3, 4) and (3, 5). (2, 3, 4, 5) yields the remaining pairs.

DrawIndirectCount on the GPU is parsing the arguments as (1, 2, 3, 4, 2), (3, 4, 5, 3, 4) and turning that into (1, 2, 4, 2) [the last two values shown) and (1, 4, 3, 4), [the first four arguments shown].

DrawIndirectCount on the emulator is parsing the arguments similarly, and turning that into (1, 2, 3, 4) and (3, 4, 5, 3).

I've run lots of other examples, all of which seem to justify my hypothesis.

The test code can be run by just:

pip install wgpu
python runner.py

For reasons I don't understand, github doesn't allow me to upload a file runner.py. It has to be put into a zip file.

runner.zip

@fyellin
Copy link
Contributor Author

fyellin commented Sep 28, 2024

Bug has been found. The count_buffer issue is in wgpu-core. The other problem is here.

In wgpu-native/src/lib.rs, the implementation of multiDrawIndirectCount was calling multiDrawIndexedIndirectCount. I will fix this.

@fyellin
Copy link
Contributor Author

fyellin commented Sep 28, 2024

The count-buffer issue has already been fixed.
gfx-rs/wgpu#6194

There is an open PR for this bug.

@fyellin
Copy link
Contributor Author

fyellin commented Sep 30, 2024

Bugs fixed. All issues are in other workspaces. We just need to wait until they work there way here.

@fyellin fyellin closed this as completed Sep 30, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant