Micro-optimize `queue_material_meshes`, primarily to remove bit manipulation. #12791

pcwalton · 2024-03-29T20:03:16Z

This commit makes the following optimizations:

`MeshPipelineKey`/`BaseMeshPipelineKey` split

MeshPipelineKey has been split into BaseMeshPipelineKey, which lives in bevy_render and MeshPipelineKey, which lives in bevy_pbr. Conceptually, BaseMeshPipelineKey is a superclass of MeshPipelineKey. For BaseMeshPipelineKey, the bits start at the highest (most significant) bit and grow downward toward the lowest bit; for MeshPipelineKey, the bits start at the lowest bit and grow upward toward the highest bit. This prevents them from colliding.

The goal of this is to avoid having to reassemble bits of the pipeline key for every mesh every frame. Instead, we can just use a bitwise or operation to combine the pieces that make up a MeshPipelineKey.

`specialize_slow`

Previously, all of specialize() was marked as #[inline]. This bloated queue_material_meshes unnecessarily, as a large chunk of it ended up being a slow path that was rarely hit. This commit refactors the function to move the slow path to specialize_slow().

Together, these two changes shave about 5% off queue_material_meshes:

Migration Guide

The primitive_topology field on GpuMesh is now an accessor method: GpuMesh::primitive_topology().
For performance reasons, MeshPipelineKey has been split into BaseMeshPipelineKey, which lives in bevy_render, and MeshPipelineKey, which lives in bevy_pbr. These two should be combined with bitwise-or to produce the final MeshPipelineKey.

manipulation. This commit makes the following optimizations: `MeshPipelineKey` has been split into `BaseMeshPipelineKey`, which lives in `bevy_render` and `MeshPipelineKey`, which lives in `bevy_pbr`. Conceptually, `BaseMeshPipelineKey` is a superclass of `MeshPipelineKey`. For `BaseMeshPipelineKey`, the bits start at the highest (most significant) bit and grow downward toward the lowest bit; for `MeshPipelineKey`, the bits start at the lowest bit and grow upward toward the highest bit. This prevents them from colliding. The goal of this is to avoid having to reassemble bits of the pipeline key for every mesh every frame. Instead, we can just use a bitwise or operation to combine the pieces that make up a `MeshPipelineKey`. Previously, all of `specialize()` was marked as `#[inline]`. This bloated `queue_material_meshes` unnecessarily, as most of it is a slow path that's rarely hit. This commit refactors the function to move the slow path to `specialize_slow()`. Together, these two changes shave about 5% off `queue_material_meshes`.

james7132 · 2024-03-30T02:44:29Z

crates/bevy_pbr/src/material.rs

+    ///
+    /// These are precalculated so that we can just "or" them together in
+    /// [`queue_material_meshes`].
+    pub mesh_pipeline_key_bits: MeshPipelineKey,


Does this hard-tie the material system to meshes? Or should we generalize MeshPipelineKey?

There would be a lot of untangling that would need to happen if we wanted to have meshes that didn't have materials, or materials that didn't have meshes. This doesn't really make things worse.

james7132

LGTM other than the aforementioned comments.

crates/bevy_render/src/render_resource/pipeline_specializer.rs

…aterial-meshes

crates/bevy_render/src/mesh/mesh/mod.rs

…aterial-meshes

…ulation. (bevyengine#12791) This commit makes the following optimizations: ## `MeshPipelineKey`/`BaseMeshPipelineKey` split `MeshPipelineKey` has been split into `BaseMeshPipelineKey`, which lives in `bevy_render` and `MeshPipelineKey`, which lives in `bevy_pbr`. Conceptually, `BaseMeshPipelineKey` is a superclass of `MeshPipelineKey`. For `BaseMeshPipelineKey`, the bits start at the highest (most significant) bit and grow downward toward the lowest bit; for `MeshPipelineKey`, the bits start at the lowest bit and grow upward toward the highest bit. This prevents them from colliding. The goal of this is to avoid having to reassemble bits of the pipeline key for every mesh every frame. Instead, we can just use a bitwise or operation to combine the pieces that make up a `MeshPipelineKey`. ## `specialize_slow` Previously, all of `specialize()` was marked as `#[inline]`. This bloated `queue_material_meshes` unnecessarily, as a large chunk of it ended up being a slow path that was rarely hit. This commit refactors the function to move the slow path to `specialize_slow()`. Together, these two changes shave about 5% off `queue_material_meshes`: ![Screenshot 2024-03-29 130002](https://github.com/bevyengine/bevy/assets/157897/a7e5a994-a807-4328-b314-9003429dcdd2) ## Migration Guide - The `primitive_topology` field on `GpuMesh` is now an accessor method: `GpuMesh::primitive_topology()`. - For performance reasons, `MeshPipelineKey` has been split into `BaseMeshPipelineKey`, which lives in `bevy_render`, and `MeshPipelineKey`, which lives in `bevy_pbr`. These two should be combined with bitwise-or to produce the final `MeshPipelineKey`.

# Objective - #12791 broke example `irradiance_volumes` - Fixes #12876 ``` wgpu error: Validation Error Caused by: In Device::create_render_pipeline note: label = `pbr_opaque_mesh_pipeline` Color state [0] is invalid Sample count 8 is not supported by format Rgba8UnormSrgb on this device. The WebGPU spec guarentees [1, 4] samples are supported by this format. With the TEXTURE_ADAPTER_SPECIFIC_FORMAT_FEATURES feature your device supports [1, 2, 4]. ``` ## Solution - Shift bits a bit more

[12791](bevyengine/bevy#12791) the `primitive_topology` field on `GpuMesh` was removed in favor of `key_bits` which can be constructed using `BaseMeshPipeline::from_primitive_topology`

* Update to 0.14.0-rc.2 * [12997](bevyengine/bevy#12997): rename `multi-threaded` to `multi_threaded` * RenderAssets<Image> is now RenderAssets<GpuImage> Implemented in [12827](bevyengine/bevy#12827) * FloatOrd is now in bevy_math implemented in [12732](bevyengine/bevy#12732) * convert Transparent2d::dynamic_offset to extra_index [12889](bevyengine/bevy#12889) Gpu Frustum Culling removed the dynamic_offset of Transparent2d and it became `extra_index` with the special value `PhaseItemExtraIndex::NONE`, which indicates the `None` that was here previously * RenderPhase<Transparent2d> -> ViewSortedRenderPhases<Transparent2d> [12453](https://github.com/StarArawn/bevy_ecs_tilemap/pull/bevyengine/bevy#12453): Render phases are now binned or sorted. Following the changes in the `mesh2d_manual` [example](https://github.com/bevyengine/bevy/blob/ecdd1624f302c5f71aaed95b0984cbbecf8880b7/examples/2d/mesh2d_manual.rs#L357-L358): use the `ViewSortedRenderPhases` resource. * get_sub_app_mut is now an Option in [9202](https://github.com/StarArawn/bevy_ecs_tilemap/pull/bevyengine/bevy/pull/9202) SubApp access has changed * GpuImage::size f32 -> u32 via UVec2 [11698](bevyengine/bevy#11698) changed `GpuImage::size` to `UVec2`. Right above this, `Extent3d` does the same thing, so I'm taking a small leap and assuming can `as`. * GpuMesh::primitive_topology -> key_bits/BaseMeshPipeline [12791](bevyengine/bevy#12791) the `primitive_topology` field on `GpuMesh` was removed in favor of `key_bits` which can be constructed using `BaseMeshPipeline::from_primitive_topology` * RenderChunk2d::prepare requires &mut MeshVertexBufferLayouts now [12216](bevyengine/bevy#12216) introduced an argument `&mut MeshVertexBufferLayouts` to `get_mesh_vertex_buffer_layout`, which bevy_ecs_tilemap calls in `RenderChunk2d::prepare` * into_linear_f32 -> color.0.linear().to_f32_array(), [12163](bevyengine/bevy#12163) bevy_color was created and Color handling has changed. Specifically Color::as_linear_rgba_f32 has been removed. LinearRgba is now its own type that can be accessed via [`linear()`](https://docs.rs/bevy/0.14.0-rc.2/bevy/color/enum.Color.html#method.linear) and then converted. * Must specify type of VisibleEntities when accessing [12582](bevyengine/bevy#12582) divided `VisibleEntities` into separate lists. So now we have to specify which kind of entity we want. I think we want the Mesh here, and I think we can get rid of the `.index` calls on Entity since Entity [already compares bits](https://docs.rs/bevy_ecs/0.14.0-rc.2/src/bevy_ecs/entity/mod.rs.html#173) for optimized codegen purposes. Waiting to do that until the other changes are in though so as to not change functionality until post-upgrade. * app.world access is functions now - [9202](bevyengine/bevy#9202) changed world access to functions. [relevent line](https://github.com/bevyengine/bevy/pull/9202/files#diff-b2fba3a0c86e496085ce7f0e3f1de5960cb754c7d215ed0f087aa556e529f97fR640) - This also surfaced [12655](bevyengine/bevy#12655) which removed `Into<AssetId<T>>` for `Handle<T>`. using a reference or .id() is the solution here. * We don't need `World::cell`, and it doesn't exist anymore In [12551](bevyengine/bevy#12551) `WorldCell` was removed. ...but it turns out we don't need it or its replacement anyway. * examples error out unless this bevy bug is addressed with these features being added bevyengine/bevy#13728 * check_visibility is required for the entity that is renderable As a result of [12582](bevyengine/bevy#12582) `check_visibility` must be implemented for the "renderable" tilemap entities. Doing this is trivial by taking advantage of the existing `check_visibility` type arguments, which accept a [`QF: QueryFilter + 'static`](https://docs.rs/bevy/0.14.0-rc.2/bevy/render/view/fn.check_visibility.html). The same `QueryFilter`` is used when checking `VisibleEntities`. I've chosen `With<TilemapRenderSettings` because presumably if the entity doesn't have a `TilemapRenderSettings` then it will not be rendering, but this could be as sophisticated or simple as we want. For example `WithLight` is currently implemented as ```rust pub type WithLight = Or<(With<PointLight>, With<SpotLight>, With<DirectionalLight>)>; ``` * view.view_proj -> view.clip_from_world [13289](bevyengine/bevy#13489) introduced matrix naming changes, including `view_proj` which becomes `clip_from_world` * color changes to make tests runnable * clippy fix * Update Cargo.toml Co-authored-by: Rob Parrett <robparrett@gmail.com> * Update Cargo.toml Co-authored-by: Rob Parrett <robparrett@gmail.com> * final clippy fixes * Update Cargo.toml Co-authored-by: Rob Parrett <robparrett@gmail.com> * Simplify async loading in ldtk/tiled helpers See Bevy #12550 * remove second allow lint * rc.3 bump * bump version for major release * remove unused features --------- Co-authored-by: Rob Parrett <robparrett@gmail.com>

pcwalton requested review from superdump and james7132 March 29, 2024 20:03

ItsDoot added A-Rendering Drawing game state to the screen C-Performance A change motivated by improving speed, memory usage or compile times labels Mar 29, 2024

james7132 reviewed Mar 30, 2024

View reviewed changes

james7132 approved these changes Mar 30, 2024

View reviewed changes

crates/bevy_render/src/render_resource/pipeline_specializer.rs Outdated Show resolved Hide resolved

Merge remote-tracking branch 'origin/main' into microoptimize-queue-m…

e6ae951

…aterial-meshes

kristoff3r approved these changes Apr 1, 2024

View reviewed changes

crates/bevy_render/src/mesh/mesh/mod.rs Outdated Show resolved Hide resolved

pcwalton added 3 commits April 1, 2024 13:42

Merge remote-tracking branch 'origin/main' into microoptimize-queue-m…

d6cf81c

…aterial-meshes

Add an assertion that the bits didn't collide

e44147b

Reword comment

516ac6a

pcwalton added the S-Ready-For-Final-Review This PR has been approved by the community. It's ready for a maintainer to consider merging it label Apr 1, 2024

alice-i-cecile added this pull request to the merge queue Apr 1, 2024

Merged via the queue into bevyengine:main with commit 37522fd Apr 1, 2024
32 checks passed

mockersf mentioned this pull request Apr 2, 2024

fix msaa shift with irradiance volumes in mesh pipeline key #12845

Merged

ChristopherBiscardi mentioned this pull request Jun 7, 2024

Update to 0.14.0 StarArawn/bevy_ecs_tilemap#537

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Micro-optimize `queue_material_meshes`, primarily to remove bit manipulation. #12791

Micro-optimize `queue_material_meshes`, primarily to remove bit manipulation. #12791

pcwalton commented Mar 29, 2024 •

edited

Loading

james7132 Mar 30, 2024

pcwalton Mar 30, 2024

james7132 left a comment

Micro-optimize queue_material_meshes, primarily to remove bit manipulation. #12791

Micro-optimize queue_material_meshes, primarily to remove bit manipulation. #12791

Conversation

pcwalton commented Mar 29, 2024 • edited Loading

MeshPipelineKey/BaseMeshPipelineKey split

specialize_slow

Migration Guide

james7132 Mar 30, 2024

Choose a reason for hiding this comment

pcwalton Mar 30, 2024

Choose a reason for hiding this comment

james7132 left a comment

Choose a reason for hiding this comment

Micro-optimize `queue_material_meshes`, primarily to remove bit manipulation. #12791

Micro-optimize `queue_material_meshes`, primarily to remove bit manipulation. #12791

pcwalton commented Mar 29, 2024 •

edited

Loading

`MeshPipelineKey`/`BaseMeshPipelineKey` split

`specialize_slow`