-
-
Notifications
You must be signed in to change notification settings - Fork 3.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Poor performance on basic rectangle benchmark #8100
Comments
Bevy is lot faster at drawing rectangles with But a single If I were trying to game this benchmark, I would try a version with two |
Like this? SUPERCILEX/bevy-vs-pixi@e94e9a6 Sadly no dice. Now the trace item that seems to be causing stuttering is |
Precisely. That's a ~2x speedup on my machine, so "some dice," perhaps.
It seems like we're seeing poor batching for... some reason though. |
Haha, Bevy is batching poorly because one of the sprites is Make your rectangles GRAY for another ~2x speedup. |
With that change, this is now (on my machine) 2x faster than pixi (native, no lto), but 4x slower than pixi (web) But we are at a disadvantage due to drawing two things per rectangle. |
Wat. Lol, this also fixes it.
Yeah, once instancing is a thing I'll go back to lyon. Tweak comparisons:
All of the "Layered sprites, 0 Z" categories are about the same +- some noise, and same goes for Zs with GRAY. Fixing the WHITE bug will bring uniformity to the Zs, but I do find the two tests I highlighted pretty odd. Why would offsetting the base sprite lead to such a significant drop in performance? Is there a better way to glue these sprites together? |
Yeah, I think that forcing fewer layers just masked the "white color" problem. |
Does that explain the (less severe) perf drop with gray though? |
I think there may be an additional interesting thing happening with regard to the "colored" vs "non-colored" sprites breaking up batches and that not being factored into the pre-batch sorting. I know that sorting itself also usually comes up when profiling this, and that may have very different characteristics when the set of z values is mostly random vs. mostly the same. |
Checking in on this in light of recent rendering changes. Situation doesn't seem great. M1 Max (native), LTO disabled, 64k rects
The good news is that the "white minus epsilon hack" is no longer needed. (same fps with white) |
I think I’m going to have to have a look at what this example does because bevymark ended up faster than main pre-#9236 from my previous testing. |
I think that the difference can be attributed to sorting, although I don't understand how sorting behavior would have actually changed in 9236. But if I modify bevymark to use a random z value instead of an incremental one, I see the same dive in performance after 9236. |
You mean you see in traces that it is due to sorting? |
Ok. #9236 did two things:
|
# Objective Fix a performance regression in the "[bevy vs pixi](https://github.com/SUPERCILEX/bevy-vs-pixi)" benchmark. This benchmark seems to have a slightly pathological distribution of `z` values -- Sprites are spawned with a random `z` value with a child sprite at `f32::EPSILON` relative to the parent. See discussion here: #8100 (comment) ## Solution Use `radsort` for sorting `Transparent2d` `PhaseItem`s. Use random `z` values in bevymark to stress the phase sort. Add an `--ordered-z` option to `bevymark` that uses the old behavior. ## Benchmarks mac m1 max | benchmark | fps before | fps after | diff | | - | - | - | - | | bevymark --waves 120 --per-wave 1000 --random-z | 42.16 | 47.06 | 🟩 +11.6% | | bevymark --waves 120 --per-wave 1000 | 52.50 | 52.29 | 🟥 -0.4% | | bevymark --waves 120 --per-wave 1000 --mode mesh2d --random-z | 9.64 | 10.24 | 🟩 +6.2% | | bevymark --waves 120 --per-wave 1000 --mode mesh2d | 15.83 | 15.59 | 🟥 -1.5% | | bevy-vs-pixi | 39.71 | 59.88 | 🟩 +50.1% | ## Discussion It's possible that `TransparentUi` should also change. We could probably use `slice::sort_unstable_by_key` with the current sort key though, as its items are always sorted and unique. I'd prefer to follow up later to look into that. Here's a survey of sorts used by other `PhaseItem`s #### slice::sort_by_key `Transparent2d`, `TransparentUi` #### radsort `Opaque3d`, `AlphaMask3d`, `Transparent3d`, `Opaque3dPrepass`, `AlphaMask3dPrepass`, `Shadow` I also tried `slice::sort_unstable_by_key` with a compound sort key including `Entity`, but it didn't seem as promising and I didn't test it as thoroughly. --------- Co-authored-by: Alice Cecile <alice.i.cecile@gmail.com> Co-authored-by: Robert Swain <robert.swain@gmail.com>
Checking in again, bevy 0.12-dev is looking pretty solid! 64k rects, mac m1 max, chrome 118, no lto, no strip, no opt-s. note: bevy is actually drawing 128k sprites here.
*crashed Unfortunately, it doesn't seem like automatic batching has helped out much with mesh2d for this particular benchmark. For reference in my mesh2d tests I'm just building meshes on demand with this function and spawning a |
It seems that this strategy was never going to work -- every entity having a unique mesh is a dealbreaker for batching. Maybe a custom material with a simple vertex shader would help, (or something more involved to do instancing?) but that's not really "stock bevy" anymore so perhaps two sprites per box is as good as it gets. |
I think this is fixed? |
# Objective Fix a performance regression in the "[bevy vs pixi](https://github.com/SUPERCILEX/bevy-vs-pixi)" benchmark. This benchmark seems to have a slightly pathological distribution of `z` values -- Sprites are spawned with a random `z` value with a child sprite at `f32::EPSILON` relative to the parent. See discussion here: bevyengine#8100 (comment) ## Solution Use `radsort` for sorting `Transparent2d` `PhaseItem`s. Use random `z` values in bevymark to stress the phase sort. Add an `--ordered-z` option to `bevymark` that uses the old behavior. ## Benchmarks mac m1 max | benchmark | fps before | fps after | diff | | - | - | - | - | | bevymark --waves 120 --per-wave 1000 --random-z | 42.16 | 47.06 | 🟩 +11.6% | | bevymark --waves 120 --per-wave 1000 | 52.50 | 52.29 | 🟥 -0.4% | | bevymark --waves 120 --per-wave 1000 --mode mesh2d --random-z | 9.64 | 10.24 | 🟩 +6.2% | | bevymark --waves 120 --per-wave 1000 --mode mesh2d | 15.83 | 15.59 | 🟥 -1.5% | | bevy-vs-pixi | 39.71 | 59.88 | 🟩 +50.1% | ## Discussion It's possible that `TransparentUi` should also change. We could probably use `slice::sort_unstable_by_key` with the current sort key though, as its items are always sorted and unique. I'd prefer to follow up later to look into that. Here's a survey of sorts used by other `PhaseItem`s #### slice::sort_by_key `Transparent2d`, `TransparentUi` #### radsort `Opaque3d`, `AlphaMask3d`, `Transparent3d`, `Opaque3dPrepass`, `AlphaMask3dPrepass`, `Shadow` I also tried `slice::sort_unstable_by_key` with a compound sort key including `Entity`, but it didn't seem as promising and I didn't test it as thoroughly. --------- Co-authored-by: Alice Cecile <alice.i.cecile@gmail.com> Co-authored-by: Robert Swain <robert.swain@gmail.com>
Bevy version
0.10
[Optional] Relevant system information
What you did
https://github.com/SUPERCILEX/bevy-vs-pixi
Web is broken right now, so just use native
cargo r --release
(and probably remove lto so the build is faster).What went wrong
Performance is unacceptably bad (I can't even hit 30fps with 2000 rectangles even though Pixi can handle 8000 rectangles at ~50 fps) and I'm not sure why unfortunately. It seems like a lot of time in a trace is spent during extraction?
bevy/crates/bevy_render/src/extract_component.rs
Line 123 in b6b549e
I'd really appreciate if someone could investigate this. Otherwise, any pointers on what could be the root cause would be great.
The text was updated successfully, but these errors were encountered: