Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Merged by Bors] - Slight perf improvements and tidy for bevymark #3765

Closed
wants to merge 15 commits into from
Closed

Conversation

SUPERCILEX
Copy link
Contributor

@SUPERCILEX SUPERCILEX commented Jan 25, 2022

No description provided.

Signed-off-by: Alex Saveau <saveau.alexandre@gmail.com>
@github-actions github-actions bot added the S-Needs-Triage This issue needs to be labelled label Jan 25, 2022
@alice-i-cecile alice-i-cecile added C-Code-Quality A section of code that is hard to understand or change C-Examples An addition or correction to our examples and removed S-Needs-Triage This issue needs to be labelled labels Jan 25, 2022
@@ -72,67 +75,74 @@ fn scheduled_spawner(
scheduled.per_wave,
bird_texture.0.clone_weak(),
);
counter.color = Color::rgb_linear(random(), random(), random());

let mut random = thread_rng();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because this is example code, could you please comment to explain why this is faster.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You mean here or as a comment in the code? thread_rng uses an Rc under the hood so theoretically we save some instructions by not cloning it a bunch of times.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Something like

// cache the rng

Might be enough enough here.

@@ -209,19 +222,19 @@ fn spawn_birds(
}

fn movement_system(time: Res<Time>, mut bird_query: Query<(&mut Bird, &mut Transform)>) {
for (mut bird, mut transform) in bird_query.iter_mut() {
bird_query.for_each_mut(|(mut bird, mut transform)| {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that using the for_each pattern here is worth it, but we should explain to beginners that this is faster (and ideally link to an explanation why).

Copy link
Contributor Author

@SUPERCILEX SUPERCILEX Jan 25, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed, I'd actually love to know why this is faster. This PR added it and said it was faster but not why.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added a note, though I have no idea if it's correct. 😅

Copy link
Contributor

@rparrett rparrett Feb 25, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can't comment on the correctness of that note, but I feel like something like

// `.for_each_mut` is faster than `.iter`, but can't be chained like a normal iterator.

would be enough. (based on text here https://docs.rs/bevy/latest/bevy/ecs/system/struct.Query.html#method.for_each_mut)

Copy link
Member

@DJMcNab DJMcNab left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Happy with the changes to use for_each, thread_rng, .single and a marker component. I'm conflicted on the change to use write, but I don't want to make the call for that.

The whitespace changes seem bizarre; rustfmt is 'borked' if it thinks the new version is better.

examples/tools/bevymark.rs Outdated Show resolved Hide resolved
examples/tools/bevymark.rs Outdated Show resolved Hide resolved
examples/tools/bevymark.rs Show resolved Hide resolved
examples/tools/bevymark.rs Outdated Show resolved Hide resolved
examples/tools/bevymark.rs Outdated Show resolved Hide resolved
Signed-off-by: Alex Saveau <saveau.alexandre@gmail.com>
Signed-off-by: Alex Saveau <saveau.alexandre@gmail.com>
Signed-off-by: Alex Saveau <saveau.alexandre@gmail.com>
Signed-off-by: Alex Saveau <saveau.alexandre@gmail.com>
@SUPERCILEX
Copy link
Contributor Author

@rparrett Would mind taking a look at this PR too? You reviewed #3764

Copy link
Contributor

@IceSentry IceSentry left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since you are modifying this example, I would also add a short description of what it is at the top of the file or at least above main(). I would also add the explanation for the .for_each use there.

@rparrett
Copy link
Contributor

There are likely to be some differing opinions about this. Examples should probably be optimized for clarity, but a "benchmark example" like bevymark seems like a good place to implement / document these sorts of optimizations. So I think this sort of thing is a better fit here than in contributors.

I agree that comments in place where a less-obvious thing is being done for performance reasons would be a good idea.

Signed-off-by: Alex Saveau <saveau.alexandre@gmail.com>
Signed-off-by: Alex Saveau <saveau.alexandre@gmail.com>
Signed-off-by: Alex Saveau <saveau.alexandre@gmail.com>
Copy link
Member

@alice-i-cecile alice-i-cecile left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking good. One very small nit + approval from another team member, then I'm happy to merge this.

@mockersf
Copy link
Member

mockersf commented Feb 25, 2022

could you mention the kind of perf improvements you get from those changes?

@rparrett
Copy link
Contributor

could you mention the kind of perf improvements you get from those changes?

I don't think it's even really measurable...

@alice-i-cecile
Copy link
Member

could you mention the kind of perf improvements you get from those changes?

I don't think it's even really measurable...

Testing locally, these changes are maybe a frame or two faster for me at reasonable entity counts. Very hard to measure precisely for this example though.

SUPERCILEX and others added 2 commits February 25, 2022 09:54
Co-authored-by: Alice Cecile <alice.i.cecile@gmail.com>
Co-authored-by: Alice Cecile <alice.i.cecile@gmail.com>
@rparrett
Copy link
Contributor

rparrett commented Feb 25, 2022

Testing locally, these changes are maybe a frame or two faster for me at reasonable entity counts. Very hard to measure precisely for this example though.

On my end, with cargo run --release --example bevymark -- 1000 200, there's a point where fps dips below vsync but bird spawning is still actively happening. In that phase, I see what might be a ~0.5-1 fps improvement. But once spawning stops, it seems too noisy to measure without more effort.

@rparrett
Copy link
Contributor

rparrett commented Feb 25, 2022

Sidenote: bevymark vsync got re-enabled (accidentally?) with #3812. Will PR separately later if I remember.

Signed-off-by: Alex Saveau <saveau.alexandre@gmail.com>
@SUPERCILEX
Copy link
Contributor Author

@rparrett Fixed

@mockersf
Copy link
Member

I don't think some of the changes in this PR are really useful as they make the example harder to read if they don't bring the perf improvements from the pr title

@SUPERCILEX
Copy link
Contributor Author

@mockersf Can you point to specifics? The only thing that got more complicated is counter_system and that's not the primary focus of the benchmark.

@SUPERCILEX
Copy link
Contributor Author

@rparrett is there anything you're still hesitant about or is this good for approval?

@cart
Copy link
Member

cart commented Feb 27, 2022

I do think the "string allocation optimizations" are on a scale we don't need to worry about here. I agree that the extra complexity isn't worth it. Other than that, I think this is good to go!

Signed-off-by: Alex Saveau <saveau.alexandre@gmail.com>
@SUPERCILEX
Copy link
Contributor Author

Sounds good, done!

@mockersf
Copy link
Member

mockersf commented Feb 28, 2022

also, I wasn't a fan of using for_each_mut, with a comment saying it's faster, when it's not really faster actually

(sorry I took so long replying, I was busy with the jam and missed your reply)

@cart
Copy link
Member

cart commented Feb 28, 2022

I was under the impression that for_each_mut() was actually still slightly faster than normal iterators (something like 1.2x), so I'm assuming you mean that it didn't meaningfully impact bevymark perf?

@cart
Copy link
Member

cart commented Feb 28, 2022

The gap has definitely closed now that derive(Component) is merged.

@SUPERCILEX
Copy link
Contributor Author

SUPERCILEX commented Feb 28, 2022

@mockersf Since it's part of the public API, I think it's actually good to showcase different ways of doing things. Also, nitpicking on this level is getting a little excessive IMO: you can't say X is faster in the docs and then complain when people use X. 🤷‍♂️

@cart
Copy link
Member

cart commented Feb 28, 2022

@mockersf Since it's part of the public API, I think it's actually good to showcase different ways of doing things. Also, nitpicking on this level is getting a little excessive IMO: you can't say X is faster in the docs and then complain when people use X. man_shrugging

I think its important for examples to encourage "best practices" and "recommend coding styles". I have intentionally pushed for normal query iterators in examples because they are about as fast, more "idiomatic rust", more flexible (because you don't need to refactor to do iterator chaining), and more ergonomic. for_each iterators exist for the (very few) cases where the small perf win matters. If it turns out that bevymark doesn't benefit from it (while iterating hundreds of thousands of entities), thats a pretty good indicator that we really shouldn't be encouraging its use. I appreciate that @mockersf brought this up and this attention to detail is important to me for examples. Unless we can demonstrate real measurable wins for these for_each iterators, they should be swapped out.

Signed-off-by: Alex Saveau <saveau.alexandre@gmail.com>
@SUPERCILEX
Copy link
Contributor Author

That makes sense, thank you for the explanation! I checked with 200K entities and there doesn't seem to be any real difference in FPS, so reverted. Maybe it's worth considering getting rid of those APIs after more careful benchmarking?

image
image

@cart
Copy link
Member

cart commented Feb 28, 2022

Yeah now that derive(Component) is a thing, its worth having the conversation.

@cart
Copy link
Member

cart commented Feb 28, 2022

bors r+

bors bot pushed a commit that referenced this pull request Feb 28, 2022
Co-authored-by: Carter Anderson <mcanders1@gmail.com>
@bors bors bot changed the title Slight perf improvements and tidy for bevymark [Merged by Bors] - Slight perf improvements and tidy for bevymark Feb 28, 2022
@bors bors bot closed this Feb 28, 2022
kurtkuehnert pushed a commit to kurtkuehnert/bevy that referenced this pull request Mar 6, 2022
Co-authored-by: Carter Anderson <mcanders1@gmail.com>
@SUPERCILEX SUPERCILEX deleted the tmp branch May 17, 2022 04:21
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
C-Code-Quality A section of code that is hard to understand or change C-Examples An addition or correction to our examples
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants