You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I'm attempting to implement these two draw commands for pygfx/wgpu-py and it seems that there are bugs in their implementation. I have tried out my code on both a Vulkan emulator and on an Linux box with an actual GPU chip and get similar problems.
The arguments to both of these calls are: renderPassEncoder, buffer, offset, count_buffer, count_buffer_offset, max_count
The two issues I have discovered are:
Both opcodes ignore the contents of count_buffer (but they do check that its length is correct). The actually value used for determining the "count" is the u32 at count_buffer_offset of buffer.
For the non-indexed draw command, the buffer is supposed to be parsed into groups of 4 u32s with a span of 16, but it appears to be actually sets of 5, which a span of 20.. When I run the code on an emulator, each quintuple [a, b, c, d, e] is interpreted as draw arguments [a, b, c, d]. When I run the code on an actual GPU, it is interpreted as [1, b, d, e].
My test code is simple, though like all CPU code, it's a bit long. A have a shader that does nothing but keeps track of every (vertex_index, instance_index) that it sees, writes it to a buffer, and nothing else. I then collect the contents of the buffer.
The documentation states that each of these is equivalent for reasonable values. In particular, these are defined by the vertex/instance indices that they use, and my code should give the same result for all of them.
`pass.draw(a, b, c, d); pass.draw(e, f, g, h);
`pass.drawIndirect(buffer, 0); pass.drawIndirect(buffer, 16)`. where `buffer` has `(a, b, c, d, e, f, g, h)` as u32s
`pass.drawIndirect(buffer, 8); pass.drawIndirect(buffer, 24)` where `buffer` has `(?, ?, a, b, c, d, e, f, g, h)` as u32s
`pass.multiDrawIndirect(buffer, 8, 2)`. where buffer is as previous.
`pass.multiDrawIndirectCount(buffer, 8, count_buffer, 0, 2)` where buffer is as previous and count contains a u32 larger then 2.
In particular, the values (a, b, c, d) should yield a*b tuple pairs, with the values being range(c, c + a) X range(d, d + b) where X is the cross product.
The first four always gave the same result. It was by trial and error that I discovered that the count_buffer was being ignored and that the count was coming from the buffer. Since these u32s were originally set to 0, my vertex shader wasn't being called! But things would work perfectly if I set the first word of the buffer to count, and just set count to 0. Likewise, by looking at the vertex/index instance pairs, I could figure out what the results were.
The code is attached.
Here is the output: In each case, DrawIndirect is being called with max_count as its count. I have include the results both of running this code on an actual GPU and of running this code on github with LavaPipe.
Now they're both giving result, even though the count is in the wrong place. DrawIndirectCount is seeing the arguments [1, 2, 4, 5] rather than [1, 2, 3, 4]. The emulator is giving the correct result.
DrawIndirect is giving the correct results. (1, 2, 3, 4) yields the pairs (3, 4) and (3, 5). (2, 3, 4, 5) yields the remaining pairs.
DrawIndirectCount on the GPU is parsing the arguments as (1, 2, 3, 4, 2), (3, 4, 5, 3, 4) and turning that into (1, 2, 4, 2) [the last two values shown) and (1, 4, 3, 4), [the first four arguments shown].
DrawIndirectCount on the emulator is parsing the arguments similarly, and turning that into (1, 2, 3, 4) and (3, 4, 5, 3).
I've run lots of other examples, all of which seem to justify my hypothesis.
The test code can be run by just:
pip install wgpu
python runner.py
For reasons I don't understand, github doesn't allow me to upload a file runner.py. It has to be put into a zip file.
I'm attempting to implement these two draw commands for pygfx/wgpu-py and it seems that there are bugs in their implementation. I have tried out my code on both a Vulkan emulator and on an Linux box with an actual GPU chip and get similar problems.
The arguments to both of these calls are:
renderPassEncoder, buffer, offset, count_buffer, count_buffer_offset, max_count
The two issues I have discovered are:
Both opcodes ignore the contents of
count_buffer
(but they do check that its length is correct). The actually value used for determining the "count" is theu32
atcount_buffer_offset
ofbuffer
.For the non-indexed draw command, the buffer is supposed to be parsed into groups of 4 u32s with a span of 16, but it appears to be actually sets of 5, which a span of 20.. When I run the code on an emulator, each quintuple [a, b, c, d, e] is interpreted as draw arguments [a, b, c, d]. When I run the code on an actual GPU, it is interpreted as [1, b, d, e].
My test code is simple, though like all CPU code, it's a bit long. A have a shader that does nothing but keeps track of every (
vertex_index
,instance_index
) that it sees, writes it to a buffer, and nothing else. I then collect the contents of the buffer.The documentation states that each of these is equivalent for reasonable values. In particular, these are defined by the vertex/instance indices that they use, and my code should give the same result for all of them.
In particular, the values (a, b, c, d) should yield a*b tuple pairs, with the values being range(c, c + a) X range(d, d + b) where X is the cross product.
The first four always gave the same result. It was by trial and error that I discovered that the
count_buffer
was being ignored and that the count was coming from thebuffer
. Since theseu32
s were originally set to 0, my vertex shader wasn't being called! But things would work perfectly if I set the first word of thebuffer
tocount
, and just setcount
to 0. Likewise, by looking at the vertex/index instance pairs, I could figure out what the results were.The code is attached.
Here is the output: In each case, DrawIndirect is being called with
max_count
as its count. I have include the results both of running this code on an actual GPU and of running this code on github with LavaPipe.Since
buffer[0] == 0
,DrawIndirectCount
isn't doing anything, though it should. When I run this on the emulator, I also getNow they're both giving result, even though the count is in the wrong place.
DrawIndirectCount
is seeing the arguments[1, 2, 4, 5]
rather than[1, 2, 3, 4]
. The emulator is giving the correct result.DrawIndirect
is giving the correct results. (1, 2, 3, 4) yields the pairs (3, 4) and (3, 5). (2, 3, 4, 5) yields the remaining pairs.DrawIndirectCount
on the GPU is parsing the arguments as (1, 2, 3, 4, 2), (3, 4, 5, 3, 4) and turning that into (1, 2, 4, 2) [the last two values shown) and (1, 4, 3, 4), [the first four arguments shown].DrawIndirectCount
on the emulator is parsing the arguments similarly, and turning that into (1, 2, 3, 4) and (3, 4, 5, 3).I've run lots of other examples, all of which seem to justify my hypothesis.
The test code can be run by just:
For reasons I don't understand, github doesn't allow me to upload a file runner.py. It has to be put into a zip file.
runner.zip
The text was updated successfully, but these errors were encountered: