-
-
Notifications
You must be signed in to change notification settings - Fork 3.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use instancing for sprites #9597
Conversation
Need to look through all the UI stuff and see if that needs improving too. |
I'm applying the same changes to UI rendering. Just got to figure out the clipping stuff. |
crates/bevy_sprite/src/render/mod.rs
Outdated
sprite_meta.sprite_index_buffer.push(base + 2); | ||
sprite_meta.sprite_index_buffer.push(base); | ||
sprite_meta.sprite_index_buffer.push(base + 1); | ||
sprite_meta.sprite_index_buffer.push(base + 1); | ||
sprite_meta.sprite_index_buffer.push(base + 3); | ||
sprite_meta.sprite_index_buffer.push(base + 2); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't have much context of how the sprite rendering works, but is the index buffer needed if it just contains the same offsets over and over again?
i.e. would it be possible to base the indices off of @builtin(vertex_index)
inside the vertex shader and read the coordinates from a buffer? (or generate the index buffer on the GPU if one is strictly needed)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is something called a 'post-transform cache' where indexed vertices that use the same index have their vertex shader results cached and reused. I think if you used unique built-in indices just specifying a range to the draw command, then you would invoke the vertex shader 6 times per quad instead of 4.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point, thanks for elaborating. I had something along the lines of vertex pulling in mind, but I'm fairly certain the caching benefits are lost in that case as you mentioned.
75159ed
to
32ed3e0
Compare
The UI stuff is independent and was a bit too complicated for me to figure out quickly, so I'm going to leave it for another time. This is up for review. |
32ed3e0
to
7d9dba3
Compare
Co-authored-by: Giacomo Stevanato <giaco.stevanato@gmail.com>
crates/bevy_sprite/src/render/mod.rs
Outdated
array_stride: 80, | ||
step_mode: VertexStepMode::Instance, | ||
attributes: vec![ | ||
// @location(0) i_model_x: vec4<f32>, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
// @location(0) i_model_x: vec4<f32>, | |
// @location(1) i_col0_tx: vec4<f32>, |
just for consistency
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These were actually meant to match their names in sprite.wgsl. Fixed.
# Objective - Supercedes bevyengine#8872 - Improve sprite rendering performance after the regression in bevyengine#9236 ## Solution - Use an instance-rate vertex buffer to store per-instance data. - Store color, UV offset and scale, and a transform per instance. - Convert Sprite rect, custom_size, anchor, and flip_x/_y to an affine 3x4 matrix and store the transpose of that in the per-instance data. This is similar to how MeshUniform uses transpose affine matrices. - Use a special index buffer that has batches of 6 indices referencing 4 vertices. The lower 2 bits indicate the x and y of a quad such that the corners are: ``` 10 11 00 01 ``` UVs are implicit but get modified by UV offset and scale The remaining upper bits contain the instance index. ## Benchmarks I will compare versus `main` before bevyengine#9236 because the results should be as good as or faster than that. Running `bevymark -- 10000 16` on an M1 Max with `main` at `e8b38925` in yellow, this PR in red: ![Screenshot 2023-08-27 at 18 44 10](https://github.com/bevyengine/bevy/assets/302146/bdc5c929-d547-44bb-b519-20dce676a316) Looking at the median frame times, that's a 37% reduction from before. --- ## Changelog - Changed: Improved sprite rendering performance by leveraging an instance-rate vertex buffer. --------- Co-authored-by: Giacomo Stevanato <giaco.stevanato@gmail.com>
# Objective Remove color specialization from `SpritePipeline` after it became useless in #9597 ## Solution Removed the `COLORED` flag from the pipeline key and removed the specializing the pipeline over it. --- ## Changelog ### Removed - `SpritePipelineKey` no longer contains the `COLORED` flag. The flag has had no effect on how the pipeline operates for a while. ## Migration Guide - The raw values for the `HDR`, `TONEMAP_IN_SHADER` and `DEBAND_DITHER` flags have changed, so if you were constructing the pipeline key from raw `u32`s you'll have to account for that.
Objective
Solution
Benchmarks
I will compare versus
main
before #9236 because the results should be as good as or faster than that. Runningbevymark -- 10000 16
on an M1 Max withmain
ate8b38925
in yellow, this PR in red:Looking at the median frame times, that's a 37% reduction from before.
Changelog