-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
EXT_mesh_gpu_instancing #1691
EXT_mesh_gpu_instancing #1691
Conversation
I'm not sure where the new attributes (TRANSFORM4x3, and ID) should get defined. |
Thanks @ultrafishotoy! We should also state that this extension's goal is to enable GPU instancing1. The definition given in the Unity docs is quite good, I think:
Of course the bit about "each instance can have different parameters" doesn't fully apply here; as we haven't yet defined — and may not define — ways to override arbitrary material parameters per-instance. But otherwise I think those are the rough goals for this extension. Do others agree, or have I missed anything important? I suppose transmission size could also be mentioned. 1 I just want to disambiguate "GPU instancing" from "instancing" because the latter could also mean simple reuse of transmitted data (accessor, texture, mesh, material, etc.) throughout a glTF asset. glTF already allows this, and the feature is widely used. It does not, however, directly enable efficient reduction of draw calls as this extension might. |
And... would you be able to share some example use cases? We've discussed at various points, but just want to add that context in the thread too. If by any chance it would be easy to generate a sample asset using the proposed extension that would also be very neat to see, but no need for anything yet if that's a nontrivial amount of work. |
extensions/2.0/Khronos/KHR_instancing/extensions/2.0/Khronos/KHR_instancing/README.md
Outdated
Show resolved
Hide resolved
extensions/2.0/Khronos/KHR_instancing/extensions/2.0/Khronos/KHR_instancing/README.md
Outdated
Show resolved
Hide resolved
extensions/2.0/Khronos/KHR_instancing/extensions/2.0/Khronos/KHR_instancing/README.md
Outdated
Show resolved
Hide resolved
{ | ||
"nodes": [ | ||
{ | ||
"mesh": 0, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One option we should consider is enabling the KHR_instancing extension to (optionally) override the node.mesh
property. For example:
name: "teapot",
mesh: 0,
extensions: {
KHR_instancing {
mesh: 1,
attributes: { ... }
}
}
This simply provides exporters with some flexibility to control how/if fallback happens for viewers that don't recognize the extension:
- When the
mesh
override is omitted, clients that don't recognize the extension will render a single instance of the mesh with the node's transform. - When the
mesh
override is provided, it should reference a single instance of the mesh. The fallbacknode.mesh
value could then point to anything the exporter chooses – a merged mesh of all instances, a single point, or a textured plane saying "404 Not Found". - When the
KHR_instancing
extension is marked as required, clients that don't recognize the extension are expected to fail fast without attempting to proceed.
I chose this design based on what Octane allows; any given mesh can be
referenced by multiple nodes some of which may be instanced. While it
is possible to collapse all references to a single group of instance
transforms, I prefer to keep the octane node structure intact so that it
can be leveraged at run-time for things like hide/show, procedural
animation, culling, etc.
…On Wed, Oct 23, 2019 at 12:55 PM Don McCurdy ***@***.***> wrote:
***@***.**** commented on this pull request.
------------------------------
In
extensions/2.0/Khronos/KHR_instancing/extensions/2.0/Khronos/KHR_instancing/README.md
<#1691 (comment)>:
> +For example, the following defines some instancing attributes to a mesh node in the graph. Instancing only applies to mesh nodes. Applying to nodes rather than meshes allows the
+same mesh to be attached as normal to nodes while also being instanced.
+
+```json
+{
+ "nodes": [
+ {
+ "mesh": 0,
+ "name": "teapot",
+ "extensions": {
+ "KHR_instancing": {
+ "attributes": {
+ "TRANSFORM4x3": 0,
+ "ID": 1
+ },
+ }
I suspect this is the right design, but just to mention an alternative,
the KHR_instancing object could also be left empty and the attributes put
into the mesh's primitive.attributes instead. The per-instance would have
different counts, compared to the native mesh attributes, and this would
break implementations that don't recognize the extension, if they tried to
use those attributes. Recommend keeping as-is.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#1691?email_source=notifications&email_token=ACCZHRBW45QFSQF2336Z7FDQQCT3JA5CNFSM4JEF2PVKYY3PNVWWK3TUL52HS4DFWFIHK3DMKJSXC5LFON2FEZLWNFSXPKTDN5WW2ZLOORPWSZGOCI7XUUA#pullrequestreview-306149968>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/ACCZHRG2QOC7TPMX4GYVBATQQCT3JANCNFSM4JEF2PVA>
.
|
extensions/2.0/Khronos/KHR_instancing/extensions/2.0/Khronos/KHR_instancing/README.md
Outdated
Show resolved
Hide resolved
I think the optional mesh is a good idea. Another property I think that would be helpful is a bounding box for the entire transform group. It could be useful for culling and picking and calculating it at run-time would be very time consuming. |
I can definitely provide some sample assets. I've got some teapot scenes specifically for testing. I guess I just need to resolve how to deal with the new attributes. |
"extensions": { | ||
"KHR_instancing": { | ||
"attributes": { | ||
"TRANSFORM4x3": 0, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The spec should mention in which coordinate space those transforms are defined. Reasonable options are
- local, like a regular mesh (full chain: instanced transforms, local node's TRS, all parents to the scene root);
- scene, like skinned mesh (all local and parent node transforms are ignored).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We're only using local space at otoy. Maybe that could be handled by alternate 'TRANSFORM' attributes that include the space? i.e. TRANSFORM_LOCAL, TRANSFORM_SCENE, with maybe TRANSFORM defaulting to LOCAL?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TRANSFORM_SCENE
behavior can be achieved with a node positioned at the scene root. Supporting only local (with explicitly described interaction between JSON-stored local TRS and accessor-stored instanced TRS) seems enough to cover both cases from engine perspective.
"KHR_instancing": { | ||
"attributes": { | ||
"TRANSFORM4x3": 0, | ||
"ID": 1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are these IDs implementation-dependent, with no spec-defined usage? If so, they shouldn't be included here. glTF 2.0 has an established workflow for supporting app-specific attribute semantics - they just start with an underscore.
Otherwise, the spec should be more sound and include at least:
- allowed datatypes (likely scalar uint8/16/32, uint32 is not available on WebGL 1.0);
- uniqueness within a set of instances, within a scene, or within an asset (otherwise they should not be called IDs).
In shaders, GPU instances have built-in IDs. Should this attribute be somehow related to them?
"extensions": { | ||
"KHR_instancing": { | ||
"attributes": { | ||
"TRANSFORM": 0, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thinking of instancing + key frame animation. Allowed values for animation.channel.target.path
are
- translation
- rotation
- scale
- weights
https://github.com/KhronosGroup/glTF/tree/master/specification/2.0#targetpath-white_check_mark
Do we also need KHR_instancing
extension in animation.channel
to specify TRANSFORM attribute as target.path?
"channels": [
{
....,
"KHR_instancing": {
"sampler": 0,
"target": {
"node": 0,
"path": "transform"
}
}
}
]
(Or instancing + key frame animation is not popular because of non good performance?)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks! Replied on #1691 (comment) for wider visibility.
@takahirox I'm not familiar with best practices for expressing animation of instance batches in a runtime-friendly way. If there are tools, engines, or formats that have set a good precedent on that, please let us know! The closest I can think of would be Point Caches in FBX (definition, example), and I don't know whether that approach would be appropriate for glTF. In particular, and similar to a known limitation of morph target animation, I expect users would want the ability to animate a single instance in the batch, rather than having to bake the transforms of each instance at each keyframe collectively. Because I'm not really confident of the answers on this, I'm inclined to say animation should be left undefined by this extension, but perhaps added later through something like #1301. However, I do think this extension should provide the ability to set application-specific, per-instance attributes on the batch. In addition to the application-specific, per-vertex attributes already allowed on the mesh. For example... meshes: [{
primitives: [{
attributes: {
POSITION: 0,
_BARYCENTRIC: 1
}
}]
}]
nodes: [{
name: "tree_person",
mesh: 0,
extensions: {
KHR_instancing {
mesh: 1,
attributes: {
TRANSFORM: 2,
ID: 3,
_POSITION_KEYFRAME_1: 4,
_POSITION_KEYFRAME_2: 5,
_POSITION_KEYFRAME_3: 6
}
}
}
}] ... in this case, the attributes prefixed with
tl;dr — I'm hoping to allow some flexibility for content authors to apply custom effects to instanced glTF models. Custom attributes and instance IDs enable that without locking in the spec to a particular animation mechanism prematurely. If/when we identify best practices for this sort of animation, which can be implemented in a performant and portable way, suggestions for a formal extension for instancing animation would of course be welcome. |
Thanks for the comment. TBH I'm not familiar with, too. I'm happy if someone shares the best practice.
Agreed.
Oh I didn't know #1301. Yeah I think ok to place animation stuffs out of this extension but somewhere else.
Sounds good to me! |
Is per-instance material parameter out of scope of this extension? (Sorry if I'm missing anything.) For example user may want to set different baseColorFactor, roughnessFactor, and other material parameters for each instance.
|
Trying to understand the rationale behind this - glTF already allows referencing the same mesh from many nodes in the transform graph. glTF loader can thus gather, for each mesh, a set of nodes that instantiate it and create the transform buffer at runtime - either at load time, if animation is not used / supported, or at runtime. Does this extension target scenes with thousands of instances of the same mesh and is thus intending to reduce the file sizes? If so, is 4x3 matrix an appropriate storage or should we use independent streams of T/R/S data instead? |
Yes. It also reduces processing time as the transform buffer can be copied to GPU as is.
Currently, it's based on an existing vendor-specific implementation. More efficient storage options (such as a lossless compression pass, sparse arrays, channels separation, etc) will certainly be considered. |
@zeux While a glTF loader could do this, in practice they don't. At least for three.js, doing so would mean reducing the node hierarchy in an opinionated way that feels unexpected without some hint (either from the asset, or from the application) that the meshes are meant to be batched. From that perspective, I see the rationale here as (1) a way for an asset to identify that meshes can and should be batched, and (2) efficient transmission+parsing of these batches. I do recognize that specifying GPU batches is somewhat unusual territory for a 3D asset format. Based on the goals of glTF for runtime efficiency, I think this could be a reasonable choice, but would be glad to hear if you'd suggest something else or think this is a bad idea altogether. 🙂 |
Thanks @zeux ! I'll have a look at the anim spec and update the schema. |
One more spec clarification (this one isn't really relevant for gltfpack, but it's good to highlight) - is there a usecase for this extension being optional (used but not required), or does it have to be in the required list, since there's no alternate way to specify instancing information? |
My feeling is that it should be required if used; if it was optional it wouldn't confer any of the transmission, loading, and runtime benefits that are specifically what this extension is designed for, and large instancing batches could get unwieldy really quickly. Anyway, lets try to do a quick discussion about this tomorrow. |
See #1691 (comment) for some previous thoughts on optional use. In general it is a good thing if extensions can be designed to enable optional use. In this case, I think that could be accomplished by adding a That said, few (no?) users have used runtime optimization extensions — like Draco — without requiring them, in practice. |
Heya, dropping some drive-by ponderings here, hope it is welcomed! What is the value of having this extension? Isn't having this a bit redundant? https://github.com/KhronosGroup/glTF-Sample-Models/blob/master/2.0/SimpleMeshes/glTF/SimpleMeshes.gltf#L13 already shows how to support instancing by sharing mesh references, unless I am mistaking something? It seems odd to have the file format specify what vertex attributes would be instanced. This is because engines have different feature sets that they can support in their instanced code paths, that are engine/shader/code layout specific, rather than input asset specific? Some engines support instancing on static meshes that have the same texture only (i.e. only instance transforms), other engines might support instancing color data as well (modulate per-vertex colors by a color multiplication modifier, or modulate a texture by a color multiplication modifier). Certain engines might instance also animated meshes, etc. Since typically the feature set of what per-instance attributes an engine is capable to render instanced, and which it can't depends on the shader code paths that are implemented, having a file format that would specify what the structure of the instance buffer should be seems like it would be dictating how a renderer must be implemented? Also different GLTF files will be authored with different instance formats, and an engine that wanted to support all would be expected to create a permutational combination of rendering shader paths to support all of them? Also note that the full possible combination of instance buffer structures is not being specified here? May be that I am missing something here, though this extension seems like it does not need to exist? If an engine is capable of collecting compatible nodes to be submitted instanced, it could do that already without this extension? I would understand that more instancing opportunities would be enabled by having a spec that would help reduce duplicate data, so that e.g. a renderer would not need to identify when two duplicated materials have the identical values (though this is already achieved by a good exporter), but this spec does not seem to improve reducing data duplication/sharing of data either(?) Does this extension help some renderers implement instancing, when before they would not have been able to? If it does, perhaps the spec might be helpful to have an intro paragraph to motivate why it helps some renderers "solve" instancing for them? Great work with GLTF on everyone involved, just scratching my head a bit on this one :) |
Hi @juj, this extension is specifically for really large instancing groups (thousands or even millions of trees, clumps of grass, etc.), with the goal of reducing file size, transmission time, and load time. Have a look at @donmccurdy's excellent analysis here : #1691 (comment). |
Oh, thanks, gotcha. Missed that this was about conserving disk space! The description rationale at https://github.com/KhronosGroup/glTF/pull/1691/files#diff-d53ba0f3c9191f50ab2501ad1edcc6f1R21 does not mention reducing file size, transmission time, and load time, but makes it read as if it'd be to optimize GPU rendering efficiency. |
I'll add some language about that. Thanks! |
For reducing transmission size, would it be possible to incorporate support for normalized 16-bit integers for quaternion storage? glTF animation data currently allows 32-bit floating point, 16-bit normalized integer and 8-bit normalized integer values, but in my experience 8-bit quaternions aren't really adequate in terms of precision so we could either specify both 8-bit/16-bit for consistency with animations, or just 16-bit for pragmatic reasons. GPUs support decoding 16-bit normalized integers when the input is bound as an (instanced) vertex stream, so I'm assuming it's reasonably straightforward to implement and it can significantly reduce the transmission size impact. Noteworthy is that three.js prototype implementation currently doesn't support that IIRC, but because it decodes quaternion data to matrix data I'm assuming it's a simple matter of adding a conversion step in JS code; @donmccurdy could comment on feasibility. |
Well, "this week" arrived 3 weeks later but I did get to this eventually :) zeux/meshoptimizer#142 implements support for this extension in gltfpack. Note that the output uses 16-bit normalized quaternions as per my suggestion above. It was trivial to implement in three.js and it already works in Babylon.JS so I'm assuming it's straightforward to do this; this saves transmission size by using a more efficient and more compressible quaternion encoding. Attaching a few example models:
Using prototype three.js implementation with a few fixes the stadium model is very quick to download and render; it's pretty slow in Babylon.JS, I believe it's because that implementation decodes the node instances into the scene graph which takes a while - I'll file an issue for that. gltfpack right now has a limitation in that it outputs all instances in world space, which leads to meshes that had skew transform (from accumulated relative transforms in the node graph) looking differently. At some point I can rework the implementation to preserve that. |
Awesome! Thanks @zeux! |
@zeux Either is fine with me. I might vote for consistency with the animation spec, since the implementation burden is low. Your suggested changes on the three.js implementation look good, as well.
@juj That is what I hope, anyway, in addition to reducing filesize and parsing cost. I understand your perspective — why don't engines just batch everything already? Nothing prevents them from merging, instancing, or otherwise optimizing a scene loaded at runtime. But in the ~9 years that WebGL has been available, I'm not aware of any engine doing so reliably, for any standard format1. I'm less familiar with native engines like Unity or Unreal, but my understanding is that they do this optimization offline and store data that is easier to batch than their input sources, e.g. FBX. Your concern is fair, and others have been raised, which is partly why this is an 1There may be exceptions to this in special areas like AEC? In geospatial visualization heavy offline processing is also common. |
This is surprising to read, working close to two decades with native APIs that support instancing, all of the engines that I ever worked with that supported instancing, use it dynamically rather than statically from the input data files. When porting UE4 and Unity3D to asm.js/wasm, both of them extend that support to WebGL. This is the first time I read that it would not be possible to do so reliably. What is preventing them from doing so? This is also the first time I read that anyone would be attempting to store instancing information "pre-solved" into an input file. The reason that static instancing is a bad idea is because of view frustum dependence. Even if you stored a forest of 100k trees as 100k instances in a file, it would be inefficient to use a single 100k instanced draw call at runtime to render them, because probably only some 1000 of them might fit in the view frustum at once - and which ones do at a given time, depends on the position/orientation of the frustum. Engines would rather do one of the following: a) first frustum+occlusion-cull all the instances, and then render whatever remain using instancing, or The solution a) would not be able to leverage any instancing data present in an input file. One might argue that b) could, if b) pre-bucketed the instances to such cells, but the cell size will invariably be engine-specific, and then an engine would have a problem of reconstructing the cell boundaries from input instance data, just complicating things.
Both engines employ both instancing and batching techniques. In neither engines does one save "pre-solved" instancing data to the optimized/baked input files. Unity utilizes strategy a) from above, and I believe that so does Unreal. Unity GPU Instancing All this being said, if instancing helps improve code size of glTF scene files, then it might be a worthy addition. However I would expect that renderers would as the first thing have to discard the instancing information after loading, i.e. it is just used as a indirection-based domain-specific compression technique. To me it seems that renderers that would faithfully render the instance information stored in a file like this, would be ones that ignore frustum culling, and behave inefficiently/suboptimally because of that; or ones that would be expecting that the scene serializer would have done a suitable optimization b) above for them. Should glTF serializers be expected to require applying optimization b)? Also, it does worry me that with this kind of instancing extension, implementing a loader becomes more complicated as one will need to be able to recognize both non-instanced and instanced ways of loading input files? |
@juj I fully agree with you, I am not sure if this extensions features are worth the cost of adding more complexity to the loaders. At first I thought this extension would be really useful to implement things like hair or fur on animated characters, but since it's been limited to static instancing, that possibility is gone. Then I thought about trees, forests, and I didn't realize about the issue of frustum culling... so, another possibility gone. So, I would like to ask which are the real, practical use cases of this extension (beyond filling a small volume with thousands of teapots or cutting down file size), because the more I see it, the more it looks like a compression technique and nothing else. Knowing this is important to me because I need to leverage the pros/cons to decide whether implementing this extension is worth or not. |
@juj —
This is a vendor extension — it is prefixed
I find this surprising as well. My understanding was that static batching does not usually work this way, but I understand dynamic batching is more flexible.
It's not impossible, no. But it's quite complex, and the argument that Unity and Unreal have implemented it, so why can't everyone else, is a bit worrying to me. I don't mean that as a criticism — if anyone has advice about how smaller engines like three.js, babylon.js, et al could implement dynamic instancing with no pre-processing of the input files, I'm certainly interested. 🙂
A goal of this extension is (IMO) to make it possible to create these buckets in an application-appropriate way. For a standalone model (e.g. for architecture or retail product preview) perhaps a single batch is fine. For a larger game world you'd certainly want cells. I consider the flexibility to decide the cell structure as a good thing — there is probably not a one-size-fits-all choice here. Reconstructing cell boundaries does not seem overly complex to me, but separate extensions have been proposed to provide world bounds at the node level, and (if we needed that information here) I would prefer to keep the information in those extensions instead of conflating the information with instancing data.
For a three.js we are able to use the instancing information very nearly as it is provided; I can't comment on other renderers.
Could you say more about what you are looking for here? Do you mean animation of individual instances, like Houdini's VAT system?
Authoring a single large batch will defeat culling, in the same way that merging an entire game world into one large mesh would defeat culling. Creating well-chosen batch cells will enable efficient culling. I see that flexibility as a strength in this extension: it would potentially enable tools like gltfpack to take input options (
Cases like trees and forests are, indeed, what I see as the practical uses of this extension. I think use cases like hair and fur may also benefit from this extension's data structure, but the extension is not (alone) sufficient to fully enable those things. In the short term I would expect to do custom processing on the loaded data to do this animation. In the long term perhaps glTF should consider procedural materials and procedural animation, similar to VAT, to more fully enable that use case. Or maybe we should really wait until something like that is designed, but I see that as a very large endeavor. |
I always thought that instancing would be a good opportunity to allow hair/fur on single characters. Think of Monsters INC or Chewbacca. But also for realistic characters where you want to display actual hair over the whole body like in real life. Crowds simulation is out of the question because the previously mentioned frustum culling problem, the same for trees/forests. I believe this instancing extension assumes that a gltf-model is going to be a culling-unit, that is, it's either rendered in full, or not at all. The problem with this approach is that I believe most engines preffer to do a full run per mesh. Some years ago I had the opportunity to use SpeedTree... What it did was, for every kind of tree (a single mesh), ran a culling over the whole scene (many square kilometers) while filling an instancing buffer dinamically, so all the visible instances of a given mesh were rendered in a single call. This is different than culling model "cells" and rendering its contents cell by cell. My point is that, since glTF does not have any concept of culling, this instancing extension is limited to geometries that are assumed to be culled as a whole, or rendered as a whole. And this removes many of the use cases of instancing. As a side note... can a gltf model using the instance extension... to be instanced? I think an engine would have to check if the model uses internal instancing, and if it's the case, render the models in the scene one by one. So maybe, supporting instancing per model prevents an engine supporting instancing per scene. In other words... me feeling is that instancing is a per scene optimization, not a per model optimization, so I think such an extensions must help an engine to do per scene instancing, and not enforcing a per model instancing on the engine. I did find a use case that might be interesting to some; which is molecule visualization, as in, defining simple colored sphere meshes as atoms, and producing models with millions of atoms. It's an useful use case, specially these days, but I am not sure if it's sooo specific that people working in the bio industry will probably have their own formats and viewers. |
@juj & @vpenades - I agree with your concerns regarding this extension. However, as @donmccurdy pointed out this is a vendor extension, ie nothing that I forsee ending up in the coming versions of glTF (as included in the core spec) - as such I guess I'm fine with it since there are a couple of clients/viewers asking for this. My initial proposal was to have this as a named WEB_GL, eg |
@@ -0,0 +1,59 @@ | |||
# EXT\_mesh\_gpu\_instancing | |||
|
|||
## Khronos 3D Formats Working Group |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please change this line to ## Contributors
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done! Thanks Ed!
@donmccurdy : it looks like I don't have write access so I can't do the merge. Maybe you can do it when you add your comments? Thanks! |
I'm merging this. If there are further comments, please open a new issue. Thanks @ultrafishotoy! |
Awesome, thanks to you @emackey! |
I've noted quantized quaternions a few times on this thread; since this is merged now I've opened #1818 to specifically address that. |
I've tried to clarify the current answers to questions raised on this PR in #1821 — feedback welcome. |
start instancing discussion