Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Merged by Bors] - Allow unbatched render phases to use unstable sorts #5049

Closed
wants to merge 20 commits into from

Conversation

james7132
Copy link
Member

@james7132 james7132 commented Jun 19, 2022

Objective

Partially addresses #4291.

Speed up the sort phase for unbatched render phases.

Solution

Split out one of the optimizations in #4899 and allow implementers of PhaseItem to change what kind of sort is used when sorting the items in the phase. Given a &mut [Self], the implementers must sort the slice in a way that is consistent with the downstream requirements. A default implementation that uses slice::sort_unstable_by_key is provided.

Performance

This will not impact the performance of any batched phases, as it is still using a stable sort. 2D's only phase is unchanged. All 3D phases are unbatched currently, and will benefit from this change.

On many_cubes, where the primary phase is opaque, this change sees a speed up from 907.02us -> 477.62us, a 47.35% reduction.

image

Future Work

There were prior discussions to add support for faster radix sorts in #4291, which in theory should be a O(n) instead of a O(nlog(n)) time. voracious has been proposed, but it seems to be optimize for use cases with more than 30,000 items, which may be atypical for most systems.

Another optimization included in #4899 is to reduce the size of a few of the IDs commonly used in PhaseItem implementations to shrink the types to make swapping/sorting faster. Both CachedPipelineId and DrawFunctionId could be reduced to u32 instead of usize.

Ideally, this should automatically change to use stable sorts when BatchedPhaseItem is implemented on the same phase item type, but this requires specialization, which may not land in stable Rust for a short while.


Changelog

Added: PhaseItem::sort

Migration Guide

RenderPhases now default to a unstable sort (via slice::sort_unstable_by_key). This can typically improve sort phase performance, but may produce incorrect batching results when implementing BatchedPhaseItem. To revert to the older stable sort, manually implement PhaseItem::sort to implement a stable sort (i.e. via slice::sort_by_key).

@james7132 james7132 added A-Rendering Drawing game state to the screen C-Performance A change motivated by improving speed, memory usage or compile times labels Jun 19, 2022
@james7132 james7132 requested a review from superdump June 19, 2022 08:03
@james7132 james7132 requested a review from mockersf June 21, 2022 06:01
Copy link
Contributor

@superdump superdump left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just some information in a comment but then lgtm.

crates/bevy_render/src/render_phase/draw.rs Outdated Show resolved Hide resolved
@james7132 james7132 requested a review from superdump June 21, 2022 07:56
Copy link
Contributor

@superdump superdump left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reworded a bit and then it is good to go.

crates/bevy_render/src/render_phase/draw.rs Outdated Show resolved Hide resolved
Co-authored-by: Robert Swain <robert.swain@gmail.com>
@james7132 james7132 requested a review from superdump June 21, 2022 11:16
Copy link
Contributor

@tim-blackbird tim-blackbird left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a nice improvement, one nit though.

crates/bevy_render/src/render_phase/draw.rs Outdated Show resolved Hide resolved
@james7132 james7132 added the S-Ready-For-Final-Review This PR has been approved by the community. It's ready for a maintainer to consider merging it label Jun 21, 2022
@james7132
Copy link
Member Author

On further thought, perhaps PhaseItem::sort(&mut [Self]) might be a more apt approach to this than just hard-coding a solution. this would allow more flexibility for implementors to use their own sort implementations if they choose to do so.

Nilirad and others added 5 commits June 21, 2022 18:30
# Objective

The descriptions included in the API docs of `entity` module, `Entity` struct, and `Component` trait have some issues:
1. the concept of entity is not clearly defined,
2. descriptions are a little bit out of place,
3. in a case the description leak too many details about the implementation,
4. some descriptions are not exhaustive,
5. there are not enough examples,
6. the content can be formatted in a much better way.

## Solution

1. ~~Stress the fact that entity is an abstract and elementary concept. Abstract because the concept of entity is not hardcoded into the library but emerges from the interaction of `Entity` with every other part of `bevy_ecs`, like components and world methods. Elementary because it is a fundamental concept that cannot be defined with other terms (like point in euclidean geometry, or time in classical physics).~~ We decided to omit the definition of entity in the API docs ([see why]). It is only described in its relationship with components.
2. Information has been moved to relevant places and links are used instead in the other places.
3. Implementation details about `Entity` have been reduced.
4. Descriptions have been made more exhaustive by stating how to obtain and use items. Entity operations are enriched with `World` methods.
5. Examples have been added or enriched.
6. Sections have been added to organize content. Entity operations are now laid out in a table.

### Todo list

- [x] Break lines at sentence-level.

## For reviewers

- ~~I added a TODO over `Component` docs, make sure to check it out and discuss it if necessary.~~ ([Resolved])
- You can easily check the rendered documentation by doing `cargo doc -p bevy_ecs --no-deps --open`.

[see why]: bevyengine#4767 (comment)
[Resolved]: bevyengine#4767 (comment)
…el (bevyengine#4663)

# Objective
Further speed up visibility checking by removing the main sources of contention for the system.

## Solution
 - ~~Make `ComputedVisibility` a resource wrapping a `FixedBitset`.~~
 - ~~Remove `ComputedVisibility` as a component.~~

~~This adds a one-bit overhead to every entity in the app world. For a game with 100,000 entities, this is 12.5KB of memory. This is still small enough to fit entirely in most L1 caches. Also removes the need for a per-Entity change detection tick. This reduces the memory footprint of ComputedVisibility 72x.~~

~~The decreased memory usage and less fragmented memory locality should provide significant performance benefits.~~

~~Clearing visible entities should be significantly faster than before:~~
 
 - ~~Setting one `u32` to 0 clears 32 entities per cycle.~~
 - ~~No archetype fragmentation to contend with.~~
 - ~~Change detection is applied to the resource, so there is no per-Entity update tick requirement.~~

~~The side benefit of this design is that it removes one more "computed component" from userspace.  Though accessing the values within it are now less ergonomic.~~

This PR changes `crossbeam_channel` in `check_visibility` to use a `Local<ThreadLocal<Cell<Vec<Entity>>>` to mark down visible entities instead.

Co-Authored-By: TheRawMeatball <therawmeatball@gmail.com>
Co-Authored-By: Aevyrie <aevyrie@gmail.com>
# Objective
Closes bevyengine#1557. Partially addresses bevyengine#3362.

Cleanup the public facing API for storage types. Most of these APIs are difficult to use safely when directly interfacing with these types, and is also currently impossible to interact with in normal ECS use as there is no `World::storages_mut`. The majority of these types should be easy enough to read, and perhaps mutate the contents, but never structurally altered without the same checks in the rest of bevy_ecs code. This both cleans up the public facing types and helps use unused code detection to remove a few of the APIs we're not using internally. 

## Solution

 - Mark all APIs that take `&mut T` under `bevy_ecs::storage` as `pub(crate)` or `pub(super)`
 - Cleanup after it all.

Entire type visibility changes:

 - `BlobVec` is `pub(super)`, only storage code should be directly interacting with it.
 - `SparseArray` is now `pub(crate)` for the entire type. It's an implementation detail for `Table` and `(Component)SparseSet`.
 - `TableMoveResult` is now `pub(crate)

---

## Changelog
TODO

## Migration Guide
Dear God, I hope not.
# Objective

- Builds on top of bevyengine#4938 
- Make clustered-forward PBR lighting/shadows functionality callable
- See bevyengine#3969 for details

## Solution

- Add `PbrInput` struct type containing a `StandardMaterial`, occlusion, world_position, world_normal, and frag_coord
- Split functionality to calculate the unit view vector, and normal-mapped normal into `bevy_pbr::pbr_functions`
- Split high-level shading flow into `pbr(in: PbrInput, N: vec3<f32>, V: vec3<f32>, is_orthographic: bool)` function in `bevy_pbr::pbr_functions`
- Rework `pbr.wgsl` fragment stage entry point to make use of the new functions
- This has been benchmarked on an M1 Max using `many_cubes -- sphere`. `main` had a median frame time of 15.88ms, this PR 15.99ms, which is a 0.69% frame time increase, which is within noise in my opinion.

---

## Changelog

- Added: PBR shading code is now callable. Import `bevy_pbr::pbr_functions` and its dependencies, create a `PbrInput`, calculate the unit view and normal-mapped normal vectors and whether the projection is orthographic, and call `pbr()`!
…gine#4716)

# Objective

DioxusLabs and Bevy have taken over maintaining what was our abandoned ui layout dependency [stretch](https://github.com/vislyhq/stretch). Dioxus' fork has had a lot of work done on it by @alice-i-cecile, @Weibye , @jkelleyrtp, @mockersf, @HackerFoo, @TimJentzsch and a dozen other contributors and now is in much better shape than stretch was. The updated crate is called taffy and is available on github [here](https://github.com/DioxusLabs/taffy) ([taffy](https://crates.io/crates/taffy) on crates.io). The goal of this PR is to replace stretch v0.3.2 with taffy v0.1.0.

## Solution

I changed the bevy_ui Cargo.toml to depend on taffy instead of stretch and fixed all the errors rustc complained about.

---

## Changelog

Changed bevy_ui layout dependency from stretch to taffy (the maintained fork of stretch).

fixes bevyengine#677

## Migration Guide

The public api of taffy is different from that of stretch so please advise me on what to do here @alice-i-cecile.
@james7132
Copy link
Member Author

james7132 commented Jun 22, 2022

Not sure what's going on with that merge.

On further thought, perhaps PhaseItem::sort(&mut [Self]) might be a more apt approach to this than just hard-coding a solution. this would allow more flexibility for implementors to use their own sort implementations if they choose to do so.

Went through with this. It's allows us to provide custom per-phase sorting behavior. Looking at #4291 and @superdump's test results, it seems like for unstable/unbatched phases, radsort produced some of the better results here, and it produces similar results on my machine.

With the same benchmark many_cubes, this further reduces the time spent to sort Opaque3d phase items from 846us to 144.18us, an 86.96% reduction in the system.

image

Copy link
Contributor

@superdump superdump left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like the structure and flexibility but I'd like to either benchmark and choose the best radix sort crate based on criteria discussed in #4291, or I think if we want to do that separately, it makes sense to not introduce a new dependency if we are probably going to replace it almost straight away.

crates/bevy_core_pipeline/Cargo.toml Show resolved Hide resolved
crates/bevy_render/src/render_phase/draw.rs Outdated Show resolved Hide resolved
Co-authored-by: Robert Swain <robert.swain@gmail.com>
@james7132
Copy link
Member Author

james7132 commented Jun 23, 2022

Tested the four aforementioned sorts on many_cubes -- sphere.

sort sort_phase_system mean timings (Opaque3d)
slice::sort_unstable_by 477.62us
radsort 144.18us
voracious (unstable) 163.52us
voracious (stable) 169.72us
rdst 413.77us
rdst (single threaded) 173.29us

With these results, I think we should stick with radsort. These results mirror that of those seen in #4291 as well.

@james7132 james7132 requested a review from superdump June 23, 2022 10:38
@superdump
Copy link
Contributor

Yup, ok. I think voracious/rdst may be faster than radsort for sprites, but ok. I'm on board with taking that separately.

@superdump
Copy link
Contributor

bors r+

bors bot pushed a commit that referenced this pull request Jun 23, 2022
# Objective

Partially addresses #4291.

Speed up the sort phase for unbatched render phases.

## Solution
Split out one of the optimizations in #4899 and allow implementors of `PhaseItem` to change what kind of sort is used when sorting the items in the phase. This currently includes Stable, Unstable, and Unsorted. Each of these corresponds to `Vec::sort_by_key`, `Vec::sort_unstable_by_key`, and no sorting at all. The default is `Unstable`. The last one can be used as a default if users introduce a preliminary depth prepass.

## Performance
This will not impact the performance of any batched phases, as it is still using a stable sort. 2D's only phase is unchanged. All 3D phases are unbatched currently, and will benefit from this change.

On `many_cubes`, where the primary phase is opaque, this change sees a speed up from 907.02us -> 477.62us, a 47.35% reduction.

![image](https://user-images.githubusercontent.com/3137680/174471253-22424874-30d5-4db5-b5b4-65fb2c612a9c.png)

## Future Work
There were prior discussions to add support for faster radix sorts in #4291, which in theory should be a `O(n)` instead of a `O(nlog(n))` time. [`voracious`](https://crates.io/crates/voracious_radix_sort) has been proposed, but it seems to be optimize for use cases with more than 30,000 items, which may be atypical for most systems.

Another optimization included in #4899 is to reduce the size of a few of the IDs commonly used in `PhaseItem` implementations to shrink the types to make swapping/sorting faster. Both `CachedPipelineId` and `DrawFunctionId` could be reduced to `u32` instead of `usize`.

Ideally, this should automatically change to use stable sorts when `BatchedPhaseItem` is implemented on the same phase item type, but this requires specialization, which may not land in stable Rust for a short while.

---

## Changelog
Added: `PhaseItem::sort`

## Migration Guide
RenderPhases now default to a unstable sort (via `slice::sort_unstable_by_key`). This can typically improve sort phase performance, but may produce incorrect batching results when implementing `BatchedPhaseItem`. To revert to the older stable sort, manually implement `PhaseItem::sort` to implement a stable sort (i.e. via `slice::sort_by_key`).

Co-authored-by: Federico Rinaldi <gisquerin@gmail.com>
Co-authored-by: Robert Swain <robert.swain@gmail.com>
Co-authored-by: colepoirier <colepoirier@gmail.com>
@bors bors bot changed the title Allow unbatched render phases to use unstable sorts [Merged by Bors] - Allow unbatched render phases to use unstable sorts Jun 23, 2022
@bors bors bot closed this Jun 23, 2022
@james7132 james7132 deleted the split-sort-phase branch June 23, 2022 11:12
inodentry pushed a commit to IyesGames/bevy that referenced this pull request Aug 8, 2022
# Objective

Partially addresses bevyengine#4291.

Speed up the sort phase for unbatched render phases.

## Solution
Split out one of the optimizations in bevyengine#4899 and allow implementors of `PhaseItem` to change what kind of sort is used when sorting the items in the phase. This currently includes Stable, Unstable, and Unsorted. Each of these corresponds to `Vec::sort_by_key`, `Vec::sort_unstable_by_key`, and no sorting at all. The default is `Unstable`. The last one can be used as a default if users introduce a preliminary depth prepass.

## Performance
This will not impact the performance of any batched phases, as it is still using a stable sort. 2D's only phase is unchanged. All 3D phases are unbatched currently, and will benefit from this change.

On `many_cubes`, where the primary phase is opaque, this change sees a speed up from 907.02us -> 477.62us, a 47.35% reduction.

![image](https://user-images.githubusercontent.com/3137680/174471253-22424874-30d5-4db5-b5b4-65fb2c612a9c.png)

## Future Work
There were prior discussions to add support for faster radix sorts in bevyengine#4291, which in theory should be a `O(n)` instead of a `O(nlog(n))` time. [`voracious`](https://crates.io/crates/voracious_radix_sort) has been proposed, but it seems to be optimize for use cases with more than 30,000 items, which may be atypical for most systems.

Another optimization included in bevyengine#4899 is to reduce the size of a few of the IDs commonly used in `PhaseItem` implementations to shrink the types to make swapping/sorting faster. Both `CachedPipelineId` and `DrawFunctionId` could be reduced to `u32` instead of `usize`.

Ideally, this should automatically change to use stable sorts when `BatchedPhaseItem` is implemented on the same phase item type, but this requires specialization, which may not land in stable Rust for a short while.

---

## Changelog
Added: `PhaseItem::sort`

## Migration Guide
RenderPhases now default to a unstable sort (via `slice::sort_unstable_by_key`). This can typically improve sort phase performance, but may produce incorrect batching results when implementing `BatchedPhaseItem`. To revert to the older stable sort, manually implement `PhaseItem::sort` to implement a stable sort (i.e. via `slice::sort_by_key`).

Co-authored-by: Federico Rinaldi <gisquerin@gmail.com>
Co-authored-by: Robert Swain <robert.swain@gmail.com>
Co-authored-by: colepoirier <colepoirier@gmail.com>
james7132 added a commit to james7132/bevy that referenced this pull request Oct 28, 2022
# Objective

Partially addresses bevyengine#4291.

Speed up the sort phase for unbatched render phases.

## Solution
Split out one of the optimizations in bevyengine#4899 and allow implementors of `PhaseItem` to change what kind of sort is used when sorting the items in the phase. This currently includes Stable, Unstable, and Unsorted. Each of these corresponds to `Vec::sort_by_key`, `Vec::sort_unstable_by_key`, and no sorting at all. The default is `Unstable`. The last one can be used as a default if users introduce a preliminary depth prepass.

## Performance
This will not impact the performance of any batched phases, as it is still using a stable sort. 2D's only phase is unchanged. All 3D phases are unbatched currently, and will benefit from this change.

On `many_cubes`, where the primary phase is opaque, this change sees a speed up from 907.02us -> 477.62us, a 47.35% reduction.

![image](https://user-images.githubusercontent.com/3137680/174471253-22424874-30d5-4db5-b5b4-65fb2c612a9c.png)

## Future Work
There were prior discussions to add support for faster radix sorts in bevyengine#4291, which in theory should be a `O(n)` instead of a `O(nlog(n))` time. [`voracious`](https://crates.io/crates/voracious_radix_sort) has been proposed, but it seems to be optimize for use cases with more than 30,000 items, which may be atypical for most systems.

Another optimization included in bevyengine#4899 is to reduce the size of a few of the IDs commonly used in `PhaseItem` implementations to shrink the types to make swapping/sorting faster. Both `CachedPipelineId` and `DrawFunctionId` could be reduced to `u32` instead of `usize`.

Ideally, this should automatically change to use stable sorts when `BatchedPhaseItem` is implemented on the same phase item type, but this requires specialization, which may not land in stable Rust for a short while.

---

## Changelog
Added: `PhaseItem::sort`

## Migration Guide
RenderPhases now default to a unstable sort (via `slice::sort_unstable_by_key`). This can typically improve sort phase performance, but may produce incorrect batching results when implementing `BatchedPhaseItem`. To revert to the older stable sort, manually implement `PhaseItem::sort` to implement a stable sort (i.e. via `slice::sort_by_key`).

Co-authored-by: Federico Rinaldi <gisquerin@gmail.com>
Co-authored-by: Robert Swain <robert.swain@gmail.com>
Co-authored-by: colepoirier <colepoirier@gmail.com>
ItsDoot pushed a commit to ItsDoot/bevy that referenced this pull request Feb 1, 2023
# Objective

Partially addresses bevyengine#4291.

Speed up the sort phase for unbatched render phases.

## Solution
Split out one of the optimizations in bevyengine#4899 and allow implementors of `PhaseItem` to change what kind of sort is used when sorting the items in the phase. This currently includes Stable, Unstable, and Unsorted. Each of these corresponds to `Vec::sort_by_key`, `Vec::sort_unstable_by_key`, and no sorting at all. The default is `Unstable`. The last one can be used as a default if users introduce a preliminary depth prepass.

## Performance
This will not impact the performance of any batched phases, as it is still using a stable sort. 2D's only phase is unchanged. All 3D phases are unbatched currently, and will benefit from this change.

On `many_cubes`, where the primary phase is opaque, this change sees a speed up from 907.02us -> 477.62us, a 47.35% reduction.

![image](https://user-images.githubusercontent.com/3137680/174471253-22424874-30d5-4db5-b5b4-65fb2c612a9c.png)

## Future Work
There were prior discussions to add support for faster radix sorts in bevyengine#4291, which in theory should be a `O(n)` instead of a `O(nlog(n))` time. [`voracious`](https://crates.io/crates/voracious_radix_sort) has been proposed, but it seems to be optimize for use cases with more than 30,000 items, which may be atypical for most systems.

Another optimization included in bevyengine#4899 is to reduce the size of a few of the IDs commonly used in `PhaseItem` implementations to shrink the types to make swapping/sorting faster. Both `CachedPipelineId` and `DrawFunctionId` could be reduced to `u32` instead of `usize`.

Ideally, this should automatically change to use stable sorts when `BatchedPhaseItem` is implemented on the same phase item type, but this requires specialization, which may not land in stable Rust for a short while.

---

## Changelog
Added: `PhaseItem::sort`

## Migration Guide
RenderPhases now default to a unstable sort (via `slice::sort_unstable_by_key`). This can typically improve sort phase performance, but may produce incorrect batching results when implementing `BatchedPhaseItem`. To revert to the older stable sort, manually implement `PhaseItem::sort` to implement a stable sort (i.e. via `slice::sort_by_key`).

Co-authored-by: Federico Rinaldi <gisquerin@gmail.com>
Co-authored-by: Robert Swain <robert.swain@gmail.com>
Co-authored-by: colepoirier <colepoirier@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-Rendering Drawing game state to the screen C-Performance A change motivated by improving speed, memory usage or compile times S-Ready-For-Final-Review This PR has been approved by the community. It's ready for a maintainer to consider merging it
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants