-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
KHR_image_formation #2128
base: main
Are you sure you want to change the base?
KHR_image_formation #2128
Conversation
|
||
This extension is intended to allow the producer of a glTF to define all the parameters necessary for a consumer (renderer) to generate a consistent output image of the scene, given consistent lighting and view. Consistent does not imply pixel-identical, as the rendering is still subject to many arbitrary approximations, but simply that the output would be identical if the physics were respected exactly. What this extension defines is a function that maps [0, infinity) input light RGB to the [0, 1] output encoding color gamut (most commonly sRGB). In practice, this function has a much larger effect on the appearance of the output than the differences in renderer approximations. | ||
|
||
Supporting this extension is only appropriate for rendering a single glTF scene; since it defines a global output mapping it is impossible to respect conflicting mappings for two glTFs in the same scene. Likewise, if a shop wanted a consistent “look” across their portfolio, they might be better served by picking a consistent mapping function and requiring their artists to design with this mapping in mind, rather than displaying different products with unique parameters. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would seem that this prohibits the usecase of mixing models, for instance a 'Metaverse' type of usecase where models from a number of different suppliers are displayed within the same scene - is this correct?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Exactly; global configuration like environment lighting, camera, and tone-mapping are necessarily scene-level, so if the app has a scene composed of many glTFs, then it doesn't make sense to support this extension. This extension is more useful for the single-model cases like Commerce where you want to completely specify the sRGB output. Granted, environment lighting also needs to be consistent as stated above.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Exactly; global configuration like environment lighting, camera, and tone-mapping are necessarily scene-level, so if the app has a scene composed of many glTFs, then it doesn't make sense to support this extension.
So we are in agreement that this extension does not solve the case of viewing multiple models or viewing a model under varying light conditions - for instance an AR/Metaverse usecase that mixes models, or any usecase that alters the light setup?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This extension is more useful for the single-model cases like Commerce where you want to completely specify the sRGB output.
Is it safe to assume that you are referring to a MPP (main product picture) usecase?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I'm concerned with 3D replacing the main product picture (this is the main use case our partners are working with us on) and not the metaverse or AR, where the lighting and post-processing are app-defined or camera-defined already.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems to me that for your usecase you see the lighting being declared in the asset - is this the case?
In what scenario do you see use for the 'exposure' parameter?
|
||
Supporting this extension is only appropriate for rendering a single glTF scene; since it defines a global output mapping it is impossible to respect conflicting mappings for two glTFs in the same scene. Likewise, if a shop wanted a consistent “look” across their portfolio, they might be better served by picking a consistent mapping function and requiring their artists to design with this mapping in mind, rather than displaying different products with unique parameters. | ||
|
||
This mapping function is split into two parts: the linear multiplier, exposure, and the nonlinear look-up tables (LUT). Exposure represents what the human pupil does to compensate for scenes with different average light levels, but here we will define it in terms from the photography world, as they have already created standard physical units to represent the camera’s version of the pupil. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reading Annex A it seems to me that that extension is modelling a "physical camera", rather than the human eye.
My suggestion would be to move that behavior (ie exposure
out of image formation
into it's own extension where the sole focus is to extend the current camera object.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would contend that the the human eye and a physical camera are very similar sensors, with the primary difference being that a physical camera is testable and quantitative instead of qualitative. Hence I prefer to use a physical camera to represent the human eye as it can be precisely defined.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Annex A only explains how you can compute exposure to model a physical camera. The exposure
parameter by itself doesn't model a physical camera at all, it's just a scalar value.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would contend that the the human eye and a physical camera are very similar sensors
Ok, here we disagree, to me a camera is substantially different.
I will try to explain what I mean:
A camera has both shutter(speed) and ISO as well as the option of changing lenses.
Where a camera will typically open/close the shutter (or by some other means let in light for a set time period) they photoreceptive cells in the retina will continuously send information that is interpreted by the visual cortex.
The effect of having a light sensitive medium and long shutterspeed can never be replicated by human vision.
Below roughly 10 lumen human vision becomes monochromatic (black & white)
Similarly for the opposite. When a large amount of light is captured on the camera medium colors will often go to white.
This will not happen in human vision.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Below roughly 10 lumen human vision becomes monochromatic (black & white)
Scotopic vision is even more complicated than this as the rods and cones use the same pathways, which leads to interesting effects in low light that are not just black and white. Here's an attempt at replicating scotopic vision based on a few research papers: google/filament#4559
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, here we disagree, to me a camera is substantially different.
What does the outlined proposal represent if we are dividing approaches into “camera photography-like” versus “human visual system-like”?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Scotopic vision is even more complicated than this as the rods and cones use the same pathways, which leads to interesting effects in low light that are not just black and white.
Sure, there are light conditions that leads to mesopic vision - a combination of rods and cones.
That does not change my argument that a camera is fundamentally different to the human eye, for instance:
-
A camera shutter can stay open for any duration, continuously exposing the film medium to photons until the shutter closes.
-
The camera aperture can be set to any size in any light conditions, exposing the film medium to any number of photons.
(Whereas the eye pupil is mostly there to protect the photoreceptor cells from too much energy, ie light) -
In a camera the medium being exposed to light is not always the same, it may vary greatly depending on type of film or what type of digital sensor is being used.
| Property | Description | Required | | ||
|:-----------------------|:------------------------------------------| :--------------------------| | ||
| `exposure` | Linear multiplier on lighting. Must be a positive value. | No, Default: `1.0` | | ||
| `hdr_lut` | Link to a 1D .cube file, defining the first post-processing step. | No | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that adding the option of a LUT to a glTF changes the PBR nature of the specification.
As I see it we are aiming towards a physically correct result with the glTF format.
A LUT may radically change the color (hue) coming out of the glTF BRDF, breaking the physical correctness.
How do you see this 'artistic intent' opposed to physical correctness?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, a LUT is part of the output process of a physical camera creating an image (and I would argue a very similar process happens in the eye's perception of light, though that is far more complex). So it is equally a part of physically-based rendering, as long as the goal is output to sRGB pixel values.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is meant to be flexible, allowing both "neutral" LUTs like product photographers use, as well as artistic ones like Sepia if desired. Again, this is just an extension, and can be easily ignored by a consumer that it doesn't make sense for.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A LUT does not change the physical correctness of the materials definition and lighting model, it does however define image formation. Those are two separate and not incompatible steps.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, a LUT is part of the output process of a physical camera creating an image
If that is the case I would argue that the LUT is applied as a last step when consuming the image, not as part of the asset itself.
Which leads me to thinking that a (physical) camera extension would be a better fit for your usecase.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A LUT does not change the physical correctness of the materials definition and lighting model, it does however define image formation.
In that case I think it is important to state that while the glTF (internal) BRDF calculations are unaffected - the output of this extension may not be physically correct.
Would you agree to that statement?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In that case I think it is important to state that while the glTF (internal) BRDF calculations are unaffected - the output of this extension may not be physically correct.
What is physically correct at the display? I would argue that it could only be considered physically correct given a perfect, known display at a known sustained peak brightness, and in a known, fixed viewing environment (and this means that the viewer should see only the glTF scene and nothing else to avoid any kind of adaptation). What this extension tries to achieve is in its very name: it's about image formation, and therefore perception rather than stimulus. And chasing after "physical correctness" at the display level is, imo, foolish considering the viewing environments (on and outside the display), display settings, display capabilities, etc.
A very simple and telling example to me is the game I was playing last night, Gran Turismo 7. It outputs HDR and is clearly not "physically correct" at the display (since it shows the sun in the sky and it's certainly not outputting 100,000 lux), but is the perception of the scene realistic/believable? Absolutely.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What this extension tries to achieve is in its very name: it's about image formation, and therefore perception rather than stimulus.
Maybe this is what makes it non physically correct in my opinion.
Allowing the perception of a glTF to be monochromatic (black&white) - when the surface properties and light environment is chromatic (colored) - is not what I would call physically correct.
My definition of "physically correct output" does not mean that it must be a 100% replication of light intensity levels 'as they are'
The contrast between higher intensity and lower intensity light shall be as correct as possible when output.
In the case of a brightly lit summer day, let's say 100 000 lumen / m2.
The amount of light that will enter through the iris and pupil, hitting the photoreceptor cells on the retina is much, much smaller.
Most of it will be blocked by the constricted pupil to protect the photoreceptor cells.
This means that light as 'captured' by the observer is in a smaller range.
Let's say that of the sun's 100 000 lumen/m2, only 10 000 lumen/m2 will enter into the eye.
The same thing will happen to the light coming from other parts of the environment making them look dark (since you are looking towards a very bright spot)
It is this contrast, between the bright sky and the shaded areas, that I would like to get to the display as it would be 'captured' by the observer.
(In this case, shaded areas being maybe 500 lumen / m2 and the sun being 10 000 lumen / m2)
This is what I would call physically correct output.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Your incorrect assumption is that the eye handles over-bright spots using only the pupil (a linear scale factor akin to aperture). This is somewhat accurate if you focus on the bright spot, but not at all if the spot is out of your central vision. On a sunny day the sun can easily be "in your eyes" without your looking straight at it. However, if your pupil clamped down enough to resolve that 100,000 lux range, the rest of your scene would be black by comparison. Instead, your retina is also nonlinear, decreasing the response from those over-stimulated cells while still resolving the the rest of the scene at reasonable lux, hence your ability to see the horizon while the sun appears white (without looking straight at it).
You're right that a LUT can be used for artistic intent, but its most common use is actually to make neutral product photography look right in sRGB or HDR (hence the IKEA beauty curve). It is is physically correct in the sense that it models part of the eye's physical response to stimulus, and it is necessary in making an image "photo-realistic".
The difficulty with #2083 is that it specifies everything right down to the display without leaving a place for the LUT to be applied. You mention applying it as a post-process, but that's exactly what this extension does. Once you're down to PQ, there isn't anywhere left to apply a post-process, and you've already lost the information you need due to dropping from floating point to 10-bit.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You're right that a LUT can be used for artistic intent, but its most common use is actually to make neutral product photography look right in sRGB or HDR (hence the IKEA beauty curve).
What you call 'Ikea beauty curve' is applied to photographs as a last step just before publishing and shall not be part of the asset.
This type of color-grading (or tone-mapping) - if it needs to be applied to glTFs - belongs in the engine or viewer in my opinion.
Adding color-grading in this way to glTF assets is a slippery slope that opens up for a multitude of problems.
The difficulty with #2083 is that it specifies everything right down to the display without leaving a place for the LUT to be applied.
Reluctant to discuss another extension here - at the same time I do not want misconceptions to get foothold.
No, that is clearly not what KHR_displaymapping_pq does.
For an overview of where color-grading could be implemented together with KHR_displaymapping_pq - please look at the section about integration points and motivation.
This should give you the information needed - if not please let me know how I can clarify!
|
||
## Extending Scene | ||
|
||
The precise method of converting scene linear light into output pixels is defined by adding the `KHR_image_formation` extension to any glTF scene. For example: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am worried that having a scene declaration that allows content creators to provide artistic intent may be confusing the format.
With this I mean the expected behavior of viewing a content creators model.
Imagine I create my model with a fancy tonemapping LUT that is Michael Bay'ish (saturated colors) and then this model is displayed in a scene with a distinct monochromatic LUT. This will happen, even if it's not part of the spec.
My model will now look totally different, and most probably wrong according to my artistic intent.
My worry is that users and content creators will see this as a serious flaw in glTF.
I suggest to remove the LUT's and focus on image formation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The LUT is a core part of how image formation is done in the photography, film, and CGI industries. It's used because it's important in making scenes look correct to the human eye despite the low range of printed or displayed images.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The LUT is a core part of how image formation is done in the photography, film, and CGI industries
Sure, and if the intention is to model the content creation pipeline of those industries I think this extension should be changed to reflect that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you have a suggestion? I'm certainly not going to picky about names.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My proposal would be to remove the LUTs with focus only on the aperture, shutterspeed and ISO parameters.
Making this into an extension that models a 'physical camera', suitable for studio photo/main product picture usecase.
At the same time making it compatible with KHR_displaymapping_pq, which could make it beneficial for broader use in 3D commerce.
Hi @elalish and nice to see your entry in this area :-)
I would say that this extension (khr_image_formation) is not an alternative to khr_displaymapping_pq as they have quite different approaches. This is very different from khr_displaymapping_pq and "the 3D Commerce usecase" where the whole purpose is to provide a 'neutral' mapping of content to the output. If color grading (exposure factor and LUT) are to be applied, that is done as a final step when content is authored to the target. In my opinion color grading (exposure factor and LUT) does not belong in a glTF asset, if needed it should be in the device that consumes the content (ie knows about the output characteristics) (For more technical details see my comments) |
A LUT can be used to apply "neutral" mapping of content to the output. I'm not exactly fond of trying to talk about neutral mapping because image formation extends beyond stimulus (esp. given the limitations of displays and the HVS). One benefit of using LUTs is that it allows authoring of "neutral" scenes (with no tone mapping at all) and of artistically crafted scenes as well (high contrast black and white for instance). Exposure in this case should not depend on the output device, but on the scene itself. |
I don't see the mixed-asset issue as being a problem. It assumes that mixing assets is going to be the final step in publishing and I can't see a situation where that would happen. I see this extension as something that a viewer or exporter would support but not something that an importer would probably read. For example, I work on Adobe Substance 3D Stager (a staging app used for synthetic photography among other things). We support importing and exporting glTF but our users set up their lighting within the application so there's not much need to support importing this info. However, when exporting or publishing directly to the web, including everything needed to render the scene as expected is very important. This includes tonemapping in addition to lighting. They go hand in hand. |
Adding my notes from the call here again and some more thoughts around this, as we've been running into many issues regarding viewers and their default display of models in general:
.
To summarize:
|
ACES has many issues. Two of the most glaring ones to me are the hue skews created when going back from AP1/AP0 back to sRGB, and the overall drive to yellow-ish.
Right, that's why some kind of tone mapping (which I prefer to refer to as range compression, because that's really what we're trying to do) is needed and can be baked into a LUT. The reason why we chose to use a LUT was to allow authors to pick their compression curve (for instance to match an existing ACES based workflow).
The original proposal @elalish and I craft was not using a LUT but defined a specific (yet configurable) range compression (tone mapping) curve with an extra step to somewhat control hue skews. Should we share this proposal? Note: that configurable curve can be configured to match ACES's compression.
That's a good point. |
Some great discussions here. I'd issue one incredible and loud warning... There's a tremendous conflation here between notions of Stimulus versus Perceptual Appearance. The CIE lists two definitions for "colour" for good reason. One describes the psychophysical stimulus specification of colour, and one describes the perceptual appearance of colour. @elalish has wisely cited that no representation in an image formed can even remotely produce the range of stimulus present "as though we were looking at it".^1 And even if this were the case, this is a seductive misnomer as cited by MacAdam, Jones, Judd, and many other researchers; an image is not ground truthed against an idealized reproduction of stimulus. As a general cautionary rule of thumb, it should be noted that psychophysical stimulus will always be nonuniform with respect to perceptual appearance. That is, chasing the dragon of 1:1 stimulus will always assert that the appearance of the thing in question will be incorrect. Not that it matters, because it is fundamentally impossible to replicate the stimulus from the mediums in question, and the subject of distortions of "hue" and other facets will take front and centre stage without properly considering the act of image formation as a whole.
|
@sobotka I'm curious, besides the warning, would you have any suggestions on how to resolve this in the context of glTF (and related cases such as e-commerce)? E.g. "viewers should just all auto-expose and try their best to somehow present in a somewhat considered neutral way"? (I value your input a lot, and try to read all your Twitter threads, even when I sometimes need a number of dictionaries to understand them) |
@hybridherbst Exposure is the easiest and least interesting part of the problem. The complex steps are about generating the desired perception, and to Troy's point, focusing on the stimulus is not enough. Based on many long conversations with Troy, that's why I was coming back to a LUT based approach because it will allow for future improvements in image formation as there is no great known solutions at the moment (although LUTs have the issue of being spatially invariant, which flies a bit in the face of proper perception generation if we'd want to take things like lightness into consideration; also see Bart Wronski's recent article on localized tone mapping). The original version of the proposal I mentioned earlier was attempting to create a somewhat neutral "tone mapping" based on efforts from Troy and others, while taking into accounts constraints like applying the result in real-time without using a LUT for certain glTF viewers. Here are a couple of examples. In each image, from top to bottom: linear (no compression), ACES (SDR/Rec.709 100nits target, using the reference implementation), the proposal. The proposal is far from perfect. There's no good gamut mapping step so it leaves a few hue skews behind but it's a noticeable improvement over ACES (blue shifting to purple is greatly improved for instance). |
I tend to think that when we are talking about selling red plastic chairs, we are really firmly anchored in the realm of a perceptual appearance approach; we want to walk into a big box shop and see the brightly orange chair as the brightly orange chair and spend our fifty bucks on it. But let's ignore the appearance side, and focus purely on ratios of tristimulus... The points that @elalish and @romainguy are raising are hugely significant here; some semblance of an open domain tristimulus compression is required. Suggesting a virtual intermediate encoding capped at 10k nits is going to be in direct opposition to the goal of selling an orange plastic chair. We can't just "scale down" via an exposure, as that pressures the middle grey value down, which means we have to bring it back up again or our orange plastic chair is "too dark". Worse, if we view the albedos of the chair in direct sunlight, now we have a clip skew happening at the 10k nit mark without some tristimulus compression. There's no real in-between here. and if we ignore the whole gargantuan rabbit hole of appearance modelling in image formation, we simply cannot have our "middle-ish value" of the range in the "middle-ish value" of the image and compress down with generalized "exposure". The orange plastic fifty dollar chair should appear orange and plastic so that someone can buy it for fifty dollars. That's impossible without some care and attention to the image formation chain such as what @romainguy and @elalish are bringing to the table here. |
|
||
One key aspect of a good transform is that it does not operate solely on luminance but also affects color, especially saturation. This is because as a colored light becomes very bright, its color ceases to be perceived. This is true of CCDs as well, and can be easily seen by taking an overexposed photo of a bright, colored light and noting how it tends toward white. This is the purpose of the tone_mapping_lut. | ||
|
||
Regarding color gamuts, since glTF defines its textures in sRGB, so long as the scene’s lighting is also represented in Rec 709 (sRGB’s color gamut), then the output light will naturally also be restricted to the Rec 709 gamut. If a wide gamut like Rec 2020 is used for output, then the values will naturally fall into the Rec 709 subset. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would scene lighting with an AP1 gamut use the same HDR LUT as scene lighting in Rec 709? Or (more generally) do different LUTs impose restrictions on the scene linear color space?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unfortunately, different working color spaces would require different LUTs. In my own tests, I found little benefit to rendering in a wider gamut before the image formation step. It does help in specific cases (esp. when doing ray tracing with light bounces) but in glTF's current form, rendering in Rec.709 is sufficient
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would scene lighting with an AP1 gamut use the same HDR LUT as scene lighting in Rec 709?
To add to this, “wider gamut” is not “better” without defining a clear notion of what we are comparing.
First Problem: Meaningfulness
RGB is simply tristimulus, and completely unrelated to how actual light transport works. We would be wise to accept that and move on. It’s really a “Good Enough” balance of bandwidth and computing, but ultimately anchored in human stimulus specifications. Remember... PBR has "B" in it... meaning "Based", not "Emulation of Physical Light Transport".
Given that AP1 uses primaries that do not exist, we are faced with an additional “distance” to compress when we attempt to represent the stimulus in a medium. For example, because the AP1 primaries are meaningless with respect to the standard observer model that anchors our entire colourimetric work, even a BT.2020 idealized display with the pure spectral emitters would not represent anything like it. Because it is “beyond” the locus, it holds literally no meaning to the standard observer, and no speculation can be even made.
We can extend that question to ask what the meaningless ratio means in terms of smaller gamuts?
Rendering something that has no meaning is part of the problem here.
Second Problem: Choice of Rendering Tristimulus
A second problem, and one that is likely more dire, is that as we move outward to the spectrally pure values in a stimulus specification, we are implicitly decreasing luminance. Luminance can be considered a portion of the general sense of “brightness” we perceive.
When we perform light transport on tristimulus of low luminance, the result is, unsurprisingly dull as hell. That is, counter to what people typically assume, the resulting mixtures will appear less colourful. The following examples are shamelessly borrowed from someone who took it upon themselves to demonstrate the impact of chosen tristimulus space on resulting tristimulus renders and image formation, Chris Brejon! Can you guess which of these were rendered in a wider gamut tristimulus model and which were rendered in BT.709 based tristimulus?
Finally, on this subject, it should be noted that when we conduct "light transport-like math" on RGB tristimulus, we are permanently baking in changes to the tristimulus that varies depending on what working RGB tristimulus model we are in. That means doing indirect bounces of identical tristimulus chromaticity will yield different results depending on what RGB model we are in.
Third Problem: "Gamut" Mapping
Even if we ignore the first two problems, we have a combination of the first problem present in our final output. If we are on a medium that cannot express the tristimulus value, such as going from BT.2020 to sRGB, how the heck do we formulate the result in the destination? A majority of approaches are just a simple clip, and that of course leads down the "hue" skew path of varying degrees, as well as making the result device dependent given that it will render differently on different output mediums.
Even if we aren't faced directly with the more abstract "meaningfulness" of nonsense values, we are again faced with the "meaning within the medium". For example, relative to sRGB, BT.2020 tristimulus mixtures may have meaning within the gamut, and no meaning for values that cannot be expressed. How do we give those values meaning? Do we focus on tonality so the plastic orange chair isn't a big huge flat wash of nasty? Do we try to decrease the perceptual facets of "chroma" or "hue" on our orange chair? What impact does that have when we go to the shop to buy the orange chair and it's... not the orange we thought we saw? These are pretty gnarly complexities that currently have no real answers without folks you to engineer them.
Sorry for the massive post, but I just wanted to get these things out into another public domain because there are so many misunderstandings and false assumptions behind much of this rendering stuff that I think many folks could contribute toward solving if greater awareness were out there.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Lots of news to me here, thanks @sobotka and @romainguy! 🙏
One takeaway I think I'm hearing (and please correct me if this is wrong) ... even when viewing the rendered image on a display that supports wider P3, AP1, or Rec 2020 gamuts, scene lighting calculated in a wider gamut working color space is not "better" (in any clear and simple sense) than scene lighting done in Rec 709?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
even when viewing the rendered image on a display that supports wider P3, AP1, or Rec 2020 gamuts, scene lighting calculated in a wider gamut working color space is not "better" (in any clear and simple sense) than scene lighting done in Rec 709?
Until we define what we are comparing on either side of the "better" the answer is yes. That is, in terms of image formation, having additional visual energy will lead to higher "colourfulness". Using pure primaries with exceptionally low luminance will potentially be detrimental here in terms of resulting image formed.
In the end, the analysis of "better" should be tempered against a clear declaration as to what is being compared, and why. It's an analysis of qualia, ultimately.
Regarding comment from @MiiBond
To me, this is a strong argument against having this extension (exposure + LUT) inside the glTF asset. Just like content is authored to different targets - movie, print or web. |
@rsahlin Agreed we ideally need different LUTs for different destinations. Happy to include that here or in a follow-on extension. As for not belonging in the asset, I agreed in the case of an IBL extension for the same reason, but especially because of the huge over-the-wire cost that many consumers wouldn't want. This is at least compact, but yes. However, I would argue khr_displaymapping_pq has exactly the same problem: it too specifies a LUT, just a very simple one. Again, anyone who wants to post-process differently (including a metaverse/AR app) will have to ignore that extension. |
In my opinion they belong to the destination (engine) and not inside the asset.
I don't think a LUT is nowhere near like an environment light. For the following reasons:
I argue that environment light does not have any of the above problems. I strongly believe that the way to provide a studio type of 'look & feel' on a glTF is by using the light setup (punctual and environment lights) - not by adding a filter. |
You, again, are completely ignoring the role the image formation process plays in this. There can be no moving forward until you come around to appreciating that one cannot, even if desired, output the “stimulus”. This whole idea is nonsense. |
As an idea to find a common ground here:
I propose:
This way, we avoid the LUT discussion (and don't introduce too much power into the asset), and the following combinations would be possible: Current behaviour:
Automatic exposure like many game engines do:
Authored fixed exposure for a daylight scene:
Similar to what KHR_displaymapping_pq would do:
When "exposureBehaviour" is either "automatic" or "maxBrightness", "exposure" serves as an offset, allowing artistic control to nudge into a "high-key" or "low-key" look and feel. These would all be per-camera and/or per-scene "renderer hints". Viewers could aim to match the closest camera's rendering hints if free navigation is allowed. When multiple assets are combined that all have maxBrightness mode, the behaviour is still deterministic and very clear. (I'm not sure how far along the IBL extension proposals are; optionally this extension here could also have something like "externalLight": true / false to specify if a viewer should apply any kind of IBL or rely on lighting and/or IBL as specified in the file instead - without that, no consistent output between viewers is possible.) |
I believe it would be a mistake to reduce the nonlinear portion of the mapping function to a choice of 1-2 presets, while still positioning this as a general-purpose image formation extension. If this were a proposal to define lighting and image formation scenarios specifically relevant to retail products displayed in typical stores (say, But for an extension meant for use in the wider glTF ecosystem, the intent of Please don't view the possibility that an artist might create a stylized LUT (rather than a "neutral" one) as a problem to be solved — it is a healthy side effect of providing the proper tools for image formation. If general consensus is that image formation definitions do not belong in a glTF file, I'm OK with that decision. But imposing more prescriptive choices on the extension will not help. |
I had proposed earlier that a better name might be "KHR_rendering_hints" or something similar that makes it very clear that these are hints for viewers on how something should be displayed. I don't have a strong opinion on including or not including user LUTs; the above was an attempt to find a useful middle ground for an extension that allows for both flexibility and the goal of getting closer to determinism. One could also argue that one extension (e.g. KHR_rendering_hints) would be about the above (exposure, perceived brightness, a good neutral mapping for HDR values) and another extension (e.g. KHR_tonemapping) would be able to override the "neutral" mode and extend it with user LUTs. I'm not sure what the current approach is to either put multiple features into one extension or to split things up (the transmission/volume/IOR triplet seems to suggest the latter). |
On optional hints — Thanks @hybridherbst, and yes! I'm comfortable with the idea that the extension should generally be optional (formally: in "extensionsUsed" but not "extensionsRequired"). It would — eventually — be worthwhile to give examples of when we'd encourage or require a client to ignore an optional On scope — More generally I am a bit concerned about the arc of the 1Apologies, I know my metaphor seems to imply that this is simple and obvious — these are very complicated topics, and nothing here is obvious. But obvious or not, we need an extension that solves the problem at hand. |
I agree with the above, especially the point about examples - but I also do understand Rickards concerns that this proposal currently doesn't solve the challenge he's facing - deterministic output of light values based on scene content or the combination of scenes. Introducing "exposureBehaviour" and "exposure" as outlined would allow for these cases in my mind (files that just scale exposure based on lights in the scene and have a kind-of-deterministic view-indepdent result). What do you think about these? |
When we were doing the PBR materials, we looked at the major game engines and evaluated what is the common denominator. Goal was the maximum compatibility and acceptance for glTF and the extension. Furthermore, we relied on the research and scientific papers of these companies. I suggest, we do the same for the "last pixel mile" - exposure and tonemapping. So, if you look at the Unreal Engine and Unity: Unreal Engine Both can do manual exposure and do have the ACES tonemapping. Unity has the option for no (clipped) tonemapping. In the Unreal Enigne, it is possible to disable tonemapping as well: So, I suggest for this extension, to only have
For the LUT, I recommend to postpone it and move it to a follow up extension. Last, but not least, having Finally, the transfer function relies on the used output colors space and render buffer and is already well defined: |
Except this loops right back to both cases being garbage options. |
When we were doing PBR next, we splitted the "beast" into several small extensions. Furthermore, we dropped and/or created new extensions in favour of compatibility and acceptance. |
@sobotka I really appreciate the detail and examples you've given above. You're right that this is a very complicated space. I doubt it's feasible to "solve" the perception problem, but the idea is to at least create consistency in output where feasible. Currently this last mile is simply unspecified; the question is can we do better than that and then build on it? Do you have any particular recommendations? |
These values are the use case of the mentioned extension. Of course these values need to be adapted for very dark or very bright scenes.
There are use cases e.g. embedded devices where another texture unit is a pain. |
I think what is being tackled by this exact attempt is wise and prudent; think about what is seen by the audience. In this case, and specifically in the case of E-commerce, which I believe is a sizeable portion of the problem surface, the quibbles over what is being discussed are actually sizable. Should the reddish chair look be distorted to pure yellow? Should the blue toy bunny be represented as a totally distorted purple? These sorts of things likely matter significantly in the context. What I am cautioning is precisely what @donmccurdy more or less stated:
And better than broken rendering for games, less E-Commerce, will only benefit. |
@UX3D-nopper is there a canonical definition of ACES Filmic tone mapping somewhere? I assume we are just talking about a tone map, and not other components of ACES. |
Most people probably copy & paste from here: To formalize, one has to dig into the original repository: Basically, we need to define, how to transfer from SRGB (or another color space see https://nick-shaw.github.io/cinematiccolor/common-rgb-color-spaces.html#x34-1330004.1) to the AP1 color space, as the ACES tone mapping happens in AP1 (the RTT and ODT part). Then "go back" to SRGB (or the color space of the output buffer). One important thingy is, that AP1 has a different illuminant than SRGB, so a conversion has to happen in many cases as well: From my perspective, converting between color spaces is already defined at many places e.g. at Khronos: The only missing part is RTT and ODT: On a first sight, it look likes one has to use the whole ACES dev package. But this is not the case. |
The definition you linked to are approximations and not the actual definition. The BakingLab one is a good approximation that's good for real time renderers. I've extracted a less-approximated* definition in Filament though: https://github.com/google/filament/blob/ebd5f150c16548b83bbb4a4fec9e4430c2fa1309/filament/src/ToneMapper.cpp#L28
|
The matrices are not approximations, the rest very much is. |
Very much no, but a much larger discussion. Folks should be paying attention to what is seen, not an obfuscated chain that ultimately amounts to nonsense. |
The source code is here: @sobotka Please guide me, where the code could be improved or where the wrong assumption is. |
Phew, I compared it with the Epic Games version: @romainguy Thx for the hint, will look for the paper/presentation, how Stephen Hill explains his changes. Anyway, this is no good news for consistency without changes, as I assume some are implementing the exact ACES and some the approximated one. Furthermore, there is another one from Narkowicz |
The one from Narkowicz is even more approximated, as it's a simple curve that remains in sRGB, and doesn't go through AP0/AP1. Its main drawback is that it loses some of the nice side effects of the ACES RRT/ODT, including some of the path to white. It's super cheap though. I even proposed an even cheaper approximation (that also combines the sRGB transfer function) in a SIGGRAPH talk: But you can trick Narkowicz into behaving more like ACES by applying it in a wider gamut, like Rec.2020: (note the hue skews in all cases though… 👎) |
Thanks a lot for the clarification. And I think I am convinced regarding the LUT approach: It is cheap to calculate and easy to implement in the shader. And, it has 100% flexibility and is fast on embedded devices. In the end - I think so - it is much easier to convince Epic Games or others to implement the LUT approach beside tweaking their shaders with specific formulas. Maybe we can specify, that the LUTs are stored in a given color space e.g. Rec.709 (Rec.2020 and so on) as a parameter inside glTF. Then, one only has to do the color conversion forth and back. Last but not least, in this extension, we should specifiy how to generate these LUTs and not just reference e.g. to Adobe. |
Furthermore, Khronos could provide this LUT generator where folks can put in their custom tone mappings algorithms. |
At that point you are so far down the rabbit hole of complete garbage that maybe it makes sense to look at this proposal. ACES is not a colour management system, and is about the worst idea one could think of adding given that it does not under any circumstances solve any real-world problems. It does not:
Both of those definitions meet the CIE definitions of "colour" if one visits the term list. It's an overhead that will bring nothing to the table, and have a direct impact on E-Commerce, entertainment, etc. |
Hmm, I just stepped away from ACES. If someone wants to use a LUT for an ACES tonemapping curve, one can do so. Does the LUT approach block your use case? |
@sobotka Hey, really appreciate your knowledge here, but can you try to keep the tone a touch more helpful? Yelling that everything is terrible doesn't really give anyone direction. Can you focus on what would be good and useful instead of what is bad? You can start by telling us what you'd like to accomplish and what about a generic LUT would or would not fulfill that. And maybe contrast it with ACES, which none of us has attachment to other than the fact that it's commonly used. |
Apologies, I’m trying to be positive here. I see a reasonable offering here that is more or less rather simplistic, and then a sidetrack. Solutionism without a declared and itemized problem seems challenging. I have looked for what this particular piece of the puzzle attempts to solve, and who are the stakeholders with their needs, and cannot find a document anywhere? |
I don't know that we've all agreed on a problem definition, but here's a strawman: We would like to allow authors of glTF 2.0 3D scenes to embed sufficient information for viewers (e.g. 3D Engines) to produce a consistent image of the scene. Current tone-mapping implementations across engines are notably inconsistent. Ideally a solution would both support prevailing current practices1, and allow some flexibility for future improvements. Because glTF aims to be a runtime-friendly format, very large LUTs and dynamic compilation of arbitrary shaders would preferably be avoided in these view transforms. More concretely, do we think that the proposed parameterization — exposure, 1D shaping lut, 3D LUT — is enough to compactly define:
I believe this proposal supports (3) and (4) trivially. I am not sure about (1) and (2). 1 For better or worse, prevailing practice appears to be sRGB OETF or ACES Filmic today. 2 Possibly Blender Filmic would be a good evaluation case? |
I believe it does cover 1 and 2 since we implement both 1 and 2 in Filament with a LUT :) BTW, @sobotka is the author of Blender Filmic, I'm sure he can shed light on whether or not they are worth considering. |
The LUT should define the tone mapping curve but not any transfer function as it depends on the used color space from the used output buffer: |
@donmccurdy I tried out all combinations what my GPU and display is providing:
|
This is where things will go sideways as the entire subject is vastly deeper than it first seems. It can help to break things down into the tristimulus data, and the resultant formed image. Per channel mechanics will form an image differently across differing mediums, depending on the working contexts. It is also worthwhile to test the specifics of image formation chains for stability. |
I spent some time looking into runtime-friendly compression for .cube LUTs, and am very happy with the results from KTX2. Results below from one of the Filmic Blender LUTs, compressed with KTX2 + ZSTD, and parsed in three.js:
More details in https://www.donmccurdy.com/2022/08/31/compressing-luts-with-ktx2/. |
@elalish We should make sure, that this extension is a real subset of https://opencolorio.org/ |
This sounds like it can produce what we want: https://opencolorio.readthedocs.io/en/latest/tutorials/baking_luts.html#shaper-spaces Agreed we should test this flow as part of an implementation. |
The purpose of this extension is to fully define how output pixels should be colored, as the current glTF spec only describes how to calculate the light output for each pixel in physical units. This extension provides a means to specify the transfer function to the limited, unitless range of an sRGB output format, as well as specifying default behavior that matches what most renderers are already using.
The techniques employed are leaning on popular existing standards: camera-style exposure, default ACES tone mapping and custom Adobe LUTs. In this way, the render can be set up to approximate a photographer's workflow as closely as possible.
This extension is an alternative to #2083, where the clamped, linear output described therein can be achieved by specifying an identity LUT. This extension does not provide special handling for newer HDR output formats, however it could be easily extended to provide different LUTs for different output ranges, as the film industry does today.