Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

StandardMaterial flat values #3

Merged

Conversation

superdump
Copy link

Added support for 'flat' values (i.e. that do not vary across the surface, so no textures) in StandardMaterial. I just got this working with the 3d_scene_pipelined example. I haven't implemented sorting by material and I'm just naively rebinding on for every mesh. I don't really like the way I had to munge the StandardMaterial's BufferId and Option<BindGroupId> into ExtractedMesh and I'd love to hear suggestions for better ways of managing that. I don't really understand why the ExtractedMesh are added to a Vec<> member in ExtractedMeshes which is inserted as a render world resource, instead of them being inserted as individual entities, as then I would have added a component to those entities. I'm guessing it has something to do with queries not guaranteeing order or something.

@superdump
Copy link
Author

Screenshot 2021-06-17 at 18 23 35

@cart
Copy link
Owner

cart commented Jun 17, 2021

I don't really understand why the ExtractedMesh are added to a Vec<> member in ExtractedMeshes which is inserted as a render world resource,

I made this call for performance reasons. I tried swapping over to per-entity spawn commands for sprites and got a pretty major performance dip.

However ultimately I think spawning as entities is probably the way to go, as it allows people to attach custom data to extracted entities and easily correlate it to the data we extract for entities (like mesh data).

I also think we have a number of options to cut down on overhead:

  1. Theres a "command optimization" PR in the main bevy repo that gets small-ish wins ~20% iirc
  2. We could use a single spawn_batch command, although we don't currently have something like "spawn_at_batch", so we can't currently re-use entity ids. Additionally, spawn_batch won't currently work because by default the Commands entity allocator is tied to the app world (this is very bleh and what necessitated the spawn_and_forget hack from before).
  3. We could directly pass in the RenderWorld as a resource during the extract step (and directly spawn entities there to avoid Command overhead).
  4. We could implement the "Commands -> SubWorlds" proposal, which would allow us to reuse allocations made in subworlds.

I don't really like the way I had to munge the StandardMaterial's BufferId and Option into ExtractedMesh and I'd love to hear suggestions for better ways of managing that

I want to experiment with doing "asset extraction" in the extract step to protect against this kind of thing. I think we can make it perform just as well, and it would be much more clean / consistent that way.

@superdump
Copy link
Author

I don't really understand why the ExtractedMesh are added to a Vec<> member in ExtractedMeshes which is inserted as a render world resource,

I made this call for performance reasons. I tried swapping over to per-entity spawn commands for sprites and got a pretty major performance dip.

However ultimately I think spawning as entities is probably the way to go, as it allows people to attach custom data to extracted entities and easily correlate it to the data we extract for entities (like mesh data).

I also think we have a number of options to cut down on overhead:

  1. Theres a "command optimization" PR in the main bevy repo that gets small-ish wins ~20% iirc
  2. We could use a single spawn_batch command, although we don't currently have something like "spawn_at_batch", so we can't currently re-use entity ids. Additionally, spawn_batch won't currently work because by default the Commands entity allocator is tied to the app world (this is very bleh and what necessitated the spawn_and_forget hack from before).
  3. We could directly pass in the RenderWorld as a resource during the extract step (and directly spawn entities there to avoid Command overhead).
  4. We could implement the "Commands -> SubWorlds" proposal, which would allow us to reuse allocations made in subworlds.

Sounds good. I’m fine with it being a resource for now and making an iterative change later when ECS performance is good enough for this.

It also feels a bit strange to have multiple copies of data - in the app world, extracted in the render world, converted from that to std140, written into a staging buffer, and copied to VRAM. That’s possibly 4 copies in system RAM and one in VRAM. I appreciate that we need to have at least one in the app world, one in the render world, and one in VRAM but is there a way to avoid the 2 additional copies? I was thinking about extracting directly into std140 buffers. And then I wonder if the staging buffer actually makes a copy or just holds a reference. The copying is inefficient, and takes space.

I don't really like the way I had to munge the StandardMaterial's BufferId and Option into ExtractedMesh and I'd love to hear suggestions for better ways of managing that

I want to experiment with doing "asset extraction" in the extract step to protect against this kind of thing. I think we can make it perform just as well, and it would be much more clean / consistent that way.

What do you mean here? I was thinking that the BufferId is only used to insert into the bind group so I could move the bind group creation into the extract step and remove the BufferId member from ExtractedMesh, only storing the BindGroupId. But it sounds like you maybe mean handling the staging and buffer copy or texture creation and upload, similar to how uniforms are handled, rather than running app world systems to do it.

@cart
Copy link
Owner

cart commented Jun 17, 2021

I was thinking about extracting directly into std140 buffers.

This would require direct access to a centralized UniformVec resource (which is doable if we insert the render world as a resource, but would require running such systems serially). The main reason not to do it is if there is expensive work required to make it happen (ex: computing the inverse of a matrix). We want to keep the extract step as short as possible to allow the next frame update to start running as quickly as possible.

I honestly doubt that the extra step costs us much. But its worth benchmarking.

And then I wonder if the staging buffer actually makes a copy or just holds a reference. The copying is inefficient, and takes space.

Afaik this is a direct memory-mapped gpu pointer.

What do you mean here? I was thinking that the BufferId is only used to insert into the bind group so I could move the bind group creation into the extract step and remove the BufferId member from ExtractedMesh, only storing the BindGroupId. But it sounds like you maybe mean handling the staging and buffer copy or texture creation and upload, similar to how uniforms are handled, rather than running app world systems to do it.

Basically:

  • Listen for asset changes in the Extract step and extract them into new Handle->ExtractedAsset maps
  • Instead of extracting BufferIds for things like mesh entities, extract the handle instead
  • Write new/changed assets to the gpu in the Prepare step
  • Lookup the BufferIds for each entity's handle in the Queue step

@cart
Copy link
Owner

cart commented Jun 18, 2021

And then I wonder if the staging buffer actually makes a copy or just holds a reference. The copying is inefficient, and takes space.

To be clear: we can definitely remove the intermediate buffer, it just puts a restriction on when/how we map the staging buffer. Without the intermediate vec it would be hard to queue things up across systems. But thats not really a major loss in this context.

@superdump
Copy link
Author

And then I wonder if the staging buffer actually makes a copy or just holds a reference. The copying is inefficient, and takes space.

To be clear: we can definitely remove the intermediate buffer, it just puts a restriction on when/how we map the staging buffer. Without the intermediate vec it would be hard to queue things up across systems. But thats not really a major loss in this context.

Perhaps it's possible to support both ways of doing things. A sort of deferred staging buffer that holds a copy of the data, and an immediate zero copy version?

@superdump
Copy link
Author

Let me know if there's anything else you'd like me to change before this is possible to merge. :)

@cart
Copy link
Owner

cart commented Jun 18, 2021

Looks good to me!

@cart cart merged commit 334a857 into cart:pipelined-rendering Jun 18, 2021
cart pushed a commit that referenced this pull request Jul 24, 2021
StandardMaterial flat values
cart pushed a commit that referenced this pull request Jul 24, 2021
StandardMaterial flat values
cart pushed a commit that referenced this pull request Apr 12, 2022
# Objective
Load skeletal weights and indices from GLTF files. Animate meshes.

## Solution
 - Load skeletal weights and indices from GLTF files.
 - Added `SkinnedMesh` component and ` SkinnedMeshInverseBindPose` asset
 - Added `extract_skinned_meshes` to extract joint matrices.
 - Added queue phase systems for enqueuing the buffer writes.

Some notes:

 -  This ports part of # bevyengine#2359 to the current main.
 -  This generates new `BufferVec`s and bind groups every frame. The expectation here is that the number of `Query::get` calls during extract is probably going to be the stronger bottleneck, with up to 256 calls per skinned mesh. Until that is optimized, caching buffers and bind groups is probably a non-concern.
 - Unfortunately, due to the uniform size requirements, this means a 16KB buffer is allocated for every skinned mesh every frame. There's probably a few ways to get around this, but most of them require either compute shaders or storage buffers, which are both incompatible with WebGL2.

Co-authored-by: james7132 <contact@jamessliu.com>
Co-authored-by: François <mockersf@gmail.com>
Co-authored-by: James Liu <contact@jamessliu.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants