Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

KHR_image_formation #2128

Open
wants to merge 2 commits into
base: main
Choose a base branch
from
Open

Conversation

elalish
Copy link
Contributor

@elalish elalish commented Mar 5, 2022

The purpose of this extension is to fully define how output pixels should be colored, as the current glTF spec only describes how to calculate the light output for each pixel in physical units. This extension provides a means to specify the transfer function to the limited, unitless range of an sRGB output format, as well as specifying default behavior that matches what most renderers are already using.

The techniques employed are leaning on popular existing standards: camera-style exposure, default ACES tone mapping and custom Adobe LUTs. In this way, the render can be set up to approximate a photographer's workflow as closely as possible.

This extension is an alternative to #2083, where the clamped, linear output described therein can be achieved by specifying an identity LUT. This extension does not provide special handling for newer HDR output formats, however it could be easily extended to provide different LUTs for different output ranges, as the film industry does today.


This extension is intended to allow the producer of a glTF to define all the parameters necessary for a consumer (renderer) to generate a consistent output image of the scene, given consistent lighting and view. Consistent does not imply pixel-identical, as the rendering is still subject to many arbitrary approximations, but simply that the output would be identical if the physics were respected exactly. What this extension defines is a function that maps [0, infinity) input light RGB to the [0, 1] output encoding color gamut (most commonly sRGB). In practice, this function has a much larger effect on the appearance of the output than the differences in renderer approximations.

Supporting this extension is only appropriate for rendering a single glTF scene; since it defines a global output mapping it is impossible to respect conflicting mappings for two glTFs in the same scene. Likewise, if a shop wanted a consistent “look” across their portfolio, they might be better served by picking a consistent mapping function and requiring their artists to design with this mapping in mind, rather than displaying different products with unique parameters.
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would seem that this prohibits the usecase of mixing models, for instance a 'Metaverse' type of usecase where models from a number of different suppliers are displayed within the same scene - is this correct?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Exactly; global configuration like environment lighting, camera, and tone-mapping are necessarily scene-level, so if the app has a scene composed of many glTFs, then it doesn't make sense to support this extension. This extension is more useful for the single-model cases like Commerce where you want to completely specify the sRGB output. Granted, environment lighting also needs to be consistent as stated above.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Exactly; global configuration like environment lighting, camera, and tone-mapping are necessarily scene-level, so if the app has a scene composed of many glTFs, then it doesn't make sense to support this extension.

So we are in agreement that this extension does not solve the case of viewing multiple models or viewing a model under varying light conditions - for instance an AR/Metaverse usecase that mixes models, or any usecase that alters the light setup?

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This extension is more useful for the single-model cases like Commerce where you want to completely specify the sRGB output.

Is it safe to assume that you are referring to a MPP (main product picture) usecase?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I'm concerned with 3D replacing the main product picture (this is the main use case our partners are working with us on) and not the metaverse or AR, where the lighting and post-processing are app-defined or camera-defined already.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems to me that for your usecase you see the lighting being declared in the asset - is this the case?
In what scenario do you see use for the 'exposure' parameter?


Supporting this extension is only appropriate for rendering a single glTF scene; since it defines a global output mapping it is impossible to respect conflicting mappings for two glTFs in the same scene. Likewise, if a shop wanted a consistent “look” across their portfolio, they might be better served by picking a consistent mapping function and requiring their artists to design with this mapping in mind, rather than displaying different products with unique parameters.

This mapping function is split into two parts: the linear multiplier, exposure, and the nonlinear look-up tables (LUT). Exposure represents what the human pupil does to compensate for scenes with different average light levels, but here we will define it in terms from the photography world, as they have already created standard physical units to represent the camera’s version of the pupil.
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reading Annex A it seems to me that that extension is modelling a "physical camera", rather than the human eye.
My suggestion would be to move that behavior (ie exposure out of image formation into it's own extension where the sole focus is to extend the current camera object.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would contend that the the human eye and a physical camera are very similar sensors, with the primary difference being that a physical camera is testable and quantitative instead of qualitative. Hence I prefer to use a physical camera to represent the human eye as it can be precisely defined.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Annex A only explains how you can compute exposure to model a physical camera. The exposure parameter by itself doesn't model a physical camera at all, it's just a scalar value.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would contend that the the human eye and a physical camera are very similar sensors

Ok, here we disagree, to me a camera is substantially different.

I will try to explain what I mean:
A camera has both shutter(speed) and ISO as well as the option of changing lenses.
Where a camera will typically open/close the shutter (or by some other means let in light for a set time period) they photoreceptive cells in the retina will continuously send information that is interpreted by the visual cortex.

The effect of having a light sensitive medium and long shutterspeed can never be replicated by human vision.
Below roughly 10 lumen human vision becomes monochromatic (black & white)

Similarly for the opposite. When a large amount of light is captured on the camera medium colors will often go to white.
This will not happen in human vision.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Below roughly 10 lumen human vision becomes monochromatic (black & white)

Scotopic vision is even more complicated than this as the rods and cones use the same pathways, which leads to interesting effects in low light that are not just black and white. Here's an attempt at replicating scotopic vision based on a few research papers: google/filament#4559

Copy link

@sobotka sobotka Mar 9, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, here we disagree, to me a camera is substantially different.

What does the outlined proposal represent if we are dividing approaches into “camera photography-like” versus “human visual system-like”?

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Scotopic vision is even more complicated than this as the rods and cones use the same pathways, which leads to interesting effects in low light that are not just black and white.

Sure, there are light conditions that leads to mesopic vision - a combination of rods and cones.

That does not change my argument that a camera is fundamentally different to the human eye, for instance:

  • A camera shutter can stay open for any duration, continuously exposing the film medium to photons until the shutter closes.

  • The camera aperture can be set to any size in any light conditions, exposing the film medium to any number of photons.
    (Whereas the eye pupil is mostly there to protect the photoreceptor cells from too much energy, ie light)

  • In a camera the medium being exposed to light is not always the same, it may vary greatly depending on type of film or what type of digital sensor is being used.

| Property | Description | Required |
|:-----------------------|:------------------------------------------| :--------------------------|
| `exposure` | Linear multiplier on lighting. Must be a positive value. | No, Default: `1.0` |
| `hdr_lut` | Link to a 1D .cube file, defining the first post-processing step. | No |
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that adding the option of a LUT to a glTF changes the PBR nature of the specification.
As I see it we are aiming towards a physically correct result with the glTF format.
A LUT may radically change the color (hue) coming out of the glTF BRDF, breaking the physical correctness.
How do you see this 'artistic intent' opposed to physical correctness?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, a LUT is part of the output process of a physical camera creating an image (and I would argue a very similar process happens in the eye's perception of light, though that is far more complex). So it is equally a part of physically-based rendering, as long as the goal is output to sRGB pixel values.

Copy link
Contributor Author

@elalish elalish Mar 7, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is meant to be flexible, allowing both "neutral" LUTs like product photographers use, as well as artistic ones like Sepia if desired. Again, this is just an extension, and can be easily ignored by a consumer that it doesn't make sense for.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A LUT does not change the physical correctness of the materials definition and lighting model, it does however define image formation. Those are two separate and not incompatible steps.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, a LUT is part of the output process of a physical camera creating an image

If that is the case I would argue that the LUT is applied as a last step when consuming the image, not as part of the asset itself.
Which leads me to thinking that a (physical) camera extension would be a better fit for your usecase.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A LUT does not change the physical correctness of the materials definition and lighting model, it does however define image formation.

In that case I think it is important to state that while the glTF (internal) BRDF calculations are unaffected - the output of this extension may not be physically correct.
Would you agree to that statement?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In that case I think it is important to state that while the glTF (internal) BRDF calculations are unaffected - the output of this extension may not be physically correct.

What is physically correct at the display? I would argue that it could only be considered physically correct given a perfect, known display at a known sustained peak brightness, and in a known, fixed viewing environment (and this means that the viewer should see only the glTF scene and nothing else to avoid any kind of adaptation). What this extension tries to achieve is in its very name: it's about image formation, and therefore perception rather than stimulus. And chasing after "physical correctness" at the display level is, imo, foolish considering the viewing environments (on and outside the display), display settings, display capabilities, etc.

A very simple and telling example to me is the game I was playing last night, Gran Turismo 7. It outputs HDR and is clearly not "physically correct" at the display (since it shows the sun in the sky and it's certainly not outputting 100,000 lux), but is the perception of the scene realistic/believable? Absolutely.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What this extension tries to achieve is in its very name: it's about image formation, and therefore perception rather than stimulus.

Maybe this is what makes it non physically correct in my opinion.
Allowing the perception of a glTF to be monochromatic (black&white) - when the surface properties and light environment is chromatic (colored) - is not what I would call physically correct.

My definition of "physically correct output" does not mean that it must be a 100% replication of light intensity levels 'as they are'
The contrast between higher intensity and lower intensity light shall be as correct as possible when output.

In the case of a brightly lit summer day, let's say 100 000 lumen / m2.
The amount of light that will enter through the iris and pupil, hitting the photoreceptor cells on the retina is much, much smaller.

Most of it will be blocked by the constricted pupil to protect the photoreceptor cells.
This means that light as 'captured' by the observer is in a smaller range.
Let's say that of the sun's 100 000 lumen/m2, only 10 000 lumen/m2 will enter into the eye.
The same thing will happen to the light coming from other parts of the environment making them look dark (since you are looking towards a very bright spot)

It is this contrast, between the bright sky and the shaded areas, that I would like to get to the display as it would be 'captured' by the observer.
(In this case, shaded areas being maybe 500 lumen / m2 and the sun being 10 000 lumen / m2)

This is what I would call physically correct output.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Your incorrect assumption is that the eye handles over-bright spots using only the pupil (a linear scale factor akin to aperture). This is somewhat accurate if you focus on the bright spot, but not at all if the spot is out of your central vision. On a sunny day the sun can easily be "in your eyes" without your looking straight at it. However, if your pupil clamped down enough to resolve that 100,000 lux range, the rest of your scene would be black by comparison. Instead, your retina is also nonlinear, decreasing the response from those over-stimulated cells while still resolving the the rest of the scene at reasonable lux, hence your ability to see the horizon while the sun appears white (without looking straight at it).

You're right that a LUT can be used for artistic intent, but its most common use is actually to make neutral product photography look right in sRGB or HDR (hence the IKEA beauty curve). It is is physically correct in the sense that it models part of the eye's physical response to stimulus, and it is necessary in making an image "photo-realistic".

The difficulty with #2083 is that it specifies everything right down to the display without leaving a place for the LUT to be applied. You mention applying it as a post-process, but that's exactly what this extension does. Once you're down to PQ, there isn't anywhere left to apply a post-process, and you've already lost the information you need due to dropping from floating point to 10-bit.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're right that a LUT can be used for artistic intent, but its most common use is actually to make neutral product photography look right in sRGB or HDR (hence the IKEA beauty curve).

What you call 'Ikea beauty curve' is applied to photographs as a last step just before publishing and shall not be part of the asset.
This type of color-grading (or tone-mapping) - if it needs to be applied to glTFs - belongs in the engine or viewer in my opinion.
Adding color-grading in this way to glTF assets is a slippery slope that opens up for a multitude of problems.

The difficulty with #2083 is that it specifies everything right down to the display without leaving a place for the LUT to be applied.

Reluctant to discuss another extension here - at the same time I do not want misconceptions to get foothold.
No, that is clearly not what KHR_displaymapping_pq does.

For an overview of where color-grading could be implemented together with KHR_displaymapping_pq - please look at the section about integration points and motivation.
This should give you the information needed - if not please let me know how I can clarify!


## Extending Scene

The precise method of converting scene linear light into output pixels is defined by adding the `KHR_image_formation` extension to any glTF scene. For example:
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am worried that having a scene declaration that allows content creators to provide artistic intent may be confusing the format.
With this I mean the expected behavior of viewing a content creators model.
Imagine I create my model with a fancy tonemapping LUT that is Michael Bay'ish (saturated colors) and then this model is displayed in a scene with a distinct monochromatic LUT. This will happen, even if it's not part of the spec.
My model will now look totally different, and most probably wrong according to my artistic intent.

My worry is that users and content creators will see this as a serious flaw in glTF.
I suggest to remove the LUT's and focus on image formation.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The LUT is a core part of how image formation is done in the photography, film, and CGI industries. It's used because it's important in making scenes look correct to the human eye despite the low range of printed or displayed images.

Copy link

@rsahlin rsahlin Mar 8, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The LUT is a core part of how image formation is done in the photography, film, and CGI industries

Sure, and if the intention is to model the content creation pipeline of those industries I think this extension should be changed to reflect that.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you have a suggestion? I'm certainly not going to picky about names.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My proposal would be to remove the LUTs with focus only on the aperture, shutterspeed and ISO parameters.
Making this into an extension that models a 'physical camera', suitable for studio photo/main product picture usecase.
At the same time making it compatible with KHR_displaymapping_pq, which could make it beneficial for broader use in 3D commerce.

@rsahlin
Copy link

rsahlin commented Mar 7, 2022

Hi @elalish and nice to see your entry in this area :-)

This extension is an alternative to #2083, where the clamped, linear output described therein can be achieved by specifying an identity LUT. This extension does not provide special handling for newer HDR output formats, however it could be easily extended to provide different LUTs for different output ranges, as the film industry does today.

I would say that this extension (khr_image_formation) is not an alternative to khr_displaymapping_pq as they have quite different approaches.
It does not seem to me that they are trying to solve the same usecase - from what I can tell this extension has a focus on a photographers workflow, with a heavy influence of color grading and artistic intent by means of the LUTs.

This is very different from khr_displaymapping_pq and "the 3D Commerce usecase" where the whole purpose is to provide a 'neutral' mapping of content to the output.

If color grading (exposure factor and LUT) are to be applied, that is done as a final step when content is authored to the target.
The same content will have different grading (exposure factor and LUT) depending on if the destination is a movie theater, DVD, broadcast, computer game or Web-viewer.
Introducing grading into the asset will lead to conficting intent, for instance when mixing assets and/or when viewing on a destination that has different grading.

In my opinion color grading (exposure factor and LUT) does not belong in a glTF asset, if needed it should be in the device that consumes the content (ie knows about the output characteristics)

(For more technical details see my comments)

@romainguy
Copy link

A LUT can be used to apply "neutral" mapping of content to the output. I'm not exactly fond of trying to talk about neutral mapping because image formation extends beyond stimulus (esp. given the limitations of displays and the HVS). One benefit of using LUTs is that it allows authoring of "neutral" scenes (with no tone mapping at all) and of artistically crafted scenes as well (high contrast black and white for instance).

Exposure in this case should not depend on the output device, but on the scene itself.

@MiiBond
Copy link
Contributor

MiiBond commented Mar 7, 2022

I don't see the mixed-asset issue as being a problem. It assumes that mixing assets is going to be the final step in publishing and I can't see a situation where that would happen. I see this extension as something that a viewer or exporter would support but not something that an importer would probably read.

For example, I work on Adobe Substance 3D Stager (a staging app used for synthetic photography among other things). We support importing and exporting glTF but our users set up their lighting within the application so there's not much need to support importing this info. However, when exporting or publishing directly to the web, including everything needed to render the scene as expected is very important. This includes tonemapping in addition to lighting. They go hand in hand.

@hybridherbst
Copy link

hybridherbst commented Mar 7, 2022

Adding my notes from the call here again and some more thoughts around this, as we've been running into many issues regarding viewers and their default display of models in general:

  • I agree with @rsahlin that this brings viewer properties into the asset level, which is, for many usecases, undesired, but for others it's great.
  • I think that the KHR_outputmapping_pq extension has a lot of the same problems, and also might "break" viewers even more if multiple assets have different pq values, but maybe I'm not understanding it right yet.
  • while I can see that ACES has been adopted in many places it isn't exactly a great/neutral tonemapping (it's hard to get some brand colors to look right with ACES)
  • any kind of maximum value (e.g. the current number of 10.000 lux) is unclear to me - e.g. sunlight can be 100.000 lux, how would I represent a scene with a sun and a "daylight lamp"?
  • I think artistic intent should not be expressed in a glTF file at all, and for that reason also don't think LUTs should be part of glTF assets, unless as "rendering hints that can safely be ignored". From experience with artists I can say that when you start adding the ability to use LUTs embedded in assets the assets start "breaking" - the LUT might be used instead of actually specifying proper color/light/texture values, for example.

.

  • I agree with both of you that an extension is needed to provide better "rendering hints" for viewers displaying assets.
  • I actually think these hints should be per camera, not per file, because different cameras might have totally different requirements for how to show a scene (e.g. a camera that is close to a light source will need to have different exposure than a camera that looks from afar)
  • there are other hints to be considered, for example if the viewer should use their own / a user-provided / developer's randomly chosen IBL and tonemapping, or use "none", or use one from a specific file (I think there are some other extensions dealing with this)

To summarize:

  • I'd rename the extension as something like KHR_rendering_hints to make it very explicit what the idea here is, and also make it more clear in the description that this is by no means something every viewer should/must use (e.g. AR will often be rendered differently and shouldn't care for any rendering_hints)
  • some properties make sense per-camera, not sure if that should be a separate extension then.
  • other properties (such as if the viewer should add its own default lights or not, or if the viewer should add its own IBL or not) might also be desired.

@romainguy
Copy link

romainguy commented Mar 7, 2022

  • while I can see that ACES has been adopted in many places it isn't exactly a great/neutral tonemapping (it's hard to get some brand colors to look right with ACES)

ACES has many issues. Two of the most glaring ones to me are the hue skews created when going back from AP1/AP0 back to sRGB, and the overall drive to yellow-ish.

  • any kind of maximum value (e.g. the current number of 10.000 lux) is unclear to me - e.g. sunlight can be 100.000 lux, how would I represent a scene with a sun and a "daylight lamp"?

Right, that's why some kind of tone mapping (which I prefer to refer to as range compression, because that's really what we're trying to do) is needed and can be baked into a LUT. The reason why we chose to use a LUT was to allow authors to pick their compression curve (for instance to match an existing ACES based workflow).

  • I think artistic intent should not be expressed in a glTF file at all, and for that reason also don't think LUTs should be part of glTF assets, unless as "rendering hints that can safely be ignored". From experience with artists I can say that when you start adding the ability to use LUTs embedded in assets the assets start "breaking" - the LUT might be used instead of actually specifying proper color/light/texture values, for example.

The original proposal @elalish and I craft was not using a LUT but defined a specific (yet configurable) range compression (tone mapping) curve with an extra step to somewhat control hue skews. Should we share this proposal? Note: that configurable curve can be configured to match ACES's compression.

  • I actually think these hints should be per camera, not per file, because different cameras might have totally different requirements for how to show a scene (e.g. a camera that is close to a light source will need to have different exposure than a camera that looks from afar)

That's a good point.

@sobotka
Copy link

sobotka commented Mar 7, 2022

Some great discussions here. I'd issue one incredible and loud warning...

There's a tremendous conflation here between notions of Stimulus versus Perceptual Appearance. The CIE lists two definitions for "colour" for good reason. One describes the psychophysical stimulus specification of colour, and one describes the perceptual appearance of colour.

@elalish has wisely cited that no representation in an image formed can even remotely produce the range of stimulus present "as though we were looking at it".^1 And even if this were the case, this is a seductive misnomer as cited by MacAdam, Jones, Judd, and many other researchers; an image is not ground truthed against an idealized reproduction of stimulus.

As a general cautionary rule of thumb, it should be noted that psychophysical stimulus will always be nonuniform with respect to perceptual appearance.

That is, chasing the dragon of 1:1 stimulus will always assert that the appearance of the thing in question will be incorrect. Not that it matters, because it is fundamentally impossible to replicate the stimulus from the mediums in question, and the subject of distortions of "hue" and other facets will take front and centre stage without properly considering the act of image formation as a whole.


  1. This covers even a theoretical optimal HDR display. The role of an image is not mere simulacrum of human perception.

@hybridherbst
Copy link

@sobotka I'm curious, besides the warning, would you have any suggestions on how to resolve this in the context of glTF (and related cases such as e-commerce)? E.g. "viewers should just all auto-expose and try their best to somehow present in a somewhat considered neutral way"?

(I value your input a lot, and try to read all your Twitter threads, even when I sometimes need a number of dictionaries to understand them)

@romainguy
Copy link

romainguy commented Mar 7, 2022

@hybridherbst Exposure is the easiest and least interesting part of the problem. The complex steps are about generating the desired perception, and to Troy's point, focusing on the stimulus is not enough. Based on many long conversations with Troy, that's why I was coming back to a LUT based approach because it will allow for future improvements in image formation as there is no great known solutions at the moment (although LUTs have the issue of being spatially invariant, which flies a bit in the face of proper perception generation if we'd want to take things like lightness into consideration; also see Bart Wronski's recent article on localized tone mapping).

The original version of the proposal I mentioned earlier was attempting to create a somewhat neutral "tone mapping" based on efforts from Troy and others, while taking into accounts constraints like applying the result in real-time without using a LUT for certain glTF viewers. Here are a couple of examples. In each image, from top to bottom: linear (no compression), ACES (SDR/Rec.709 100nits target, using the reference implementation), the proposal.

rgb_sweep

hsv_sweep

The proposal is far from perfect. There's no good gamut mapping step so it leaves a few hue skews behind but it's a noticeable improvement over ACES (blue shifting to purple is greatly improved for instance).

@sobotka
Copy link

sobotka commented Mar 7, 2022

would you have any suggestions on how to resolve this in the context of glTF (and related cases such as e-commerce)? E.g. "viewers should just all auto-expose and try their best to somehow present in a somewhat considered neutral way"?

I tend to think that when we are talking about selling red plastic chairs, we are really firmly anchored in the realm of a perceptual appearance approach; we want to walk into a big box shop and see the brightly orange chair as the brightly orange chair and spend our fifty bucks on it. But let's ignore the appearance side, and focus purely on ratios of tristimulus...

The points that @elalish and @romainguy are raising are hugely significant here; some semblance of an open domain tristimulus compression is required.

Suggesting a virtual intermediate encoding capped at 10k nits is going to be in direct opposition to the goal of selling an orange plastic chair. We can't just "scale down" via an exposure, as that pressures the middle grey value down, which means we have to bring it back up again or our orange plastic chair is "too dark". Worse, if we view the albedos of the chair in direct sunlight, now we have a clip skew happening at the 10k nit mark without some tristimulus compression.

There's no real in-between here. and if we ignore the whole gargantuan rabbit hole of appearance modelling in image formation, we simply cannot have our "middle-ish value" of the range in the "middle-ish value" of the image and compress down with generalized "exposure".

The orange plastic fifty dollar chair should appear orange and plastic so that someone can buy it for fifty dollars. That's impossible without some care and attention to the image formation chain such as what @romainguy and @elalish are bringing to the table here.

@hybridherbst hybridherbst mentioned this pull request Mar 8, 2022

One key aspect of a good transform is that it does not operate solely on luminance but also affects color, especially saturation. This is because as a colored light becomes very bright, its color ceases to be perceived. This is true of CCDs as well, and can be easily seen by taking an overexposed photo of a bright, colored light and noting how it tends toward white. This is the purpose of the tone_mapping_lut.

Regarding color gamuts, since glTF defines its textures in sRGB, so long as the scene’s lighting is also represented in Rec 709 (sRGB’s color gamut), then the output light will naturally also be restricted to the Rec 709 gamut. If a wide gamut like Rec 2020 is used for output, then the values will naturally fall into the Rec 709 subset.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would scene lighting with an AP1 gamut use the same HDR LUT as scene lighting in Rec 709? Or (more generally) do different LUTs impose restrictions on the scene linear color space?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unfortunately, different working color spaces would require different LUTs. In my own tests, I found little benefit to rendering in a wider gamut before the image formation step. It does help in specific cases (esp. when doing ray tracing with light bounces) but in glTF's current form, rendering in Rec.709 is sufficient

Copy link

@sobotka sobotka Mar 8, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would scene lighting with an AP1 gamut use the same HDR LUT as scene lighting in Rec 709?

To add to this, “wider gamut” is not “better” without defining a clear notion of what we are comparing.

First Problem: Meaningfulness

RGB is simply tristimulus, and completely unrelated to how actual light transport works. We would be wise to accept that and move on. It’s really a “Good Enough” balance of bandwidth and computing, but ultimately anchored in human stimulus specifications. Remember... PBR has "B" in it... meaning "Based", not "Emulation of Physical Light Transport".

Given that AP1 uses primaries that do not exist, we are faced with an additional “distance” to compress when we attempt to represent the stimulus in a medium. For example, because the AP1 primaries are meaningless with respect to the standard observer model that anchors our entire colourimetric work, even a BT.2020 idealized display with the pure spectral emitters would not represent anything like it. Because it is “beyond” the locus, it holds literally no meaning to the standard observer, and no speculation can be even made.

We can extend that question to ask what the meaningless ratio means in terms of smaller gamuts?

Rendering something that has no meaning is part of the problem here.

Second Problem: Choice of Rendering Tristimulus

A second problem, and one that is likely more dire, is that as we move outward to the spectrally pure values in a stimulus specification, we are implicitly decreasing luminance. Luminance can be considered a portion of the general sense of “brightness” we perceive.

When we perform light transport on tristimulus of low luminance, the result is, unsurprisingly dull as hell. That is, counter to what people typically assume, the resulting mixtures will appear less colourful. The following examples are shamelessly borrowed from someone who took it upon themselves to demonstrate the impact of chosen tristimulus space on resulting tristimulus renders and image formation, Chris Brejon! Can you guess which of these were rendered in a wider gamut tristimulus model and which were rendered in BT.709 based tristimulus?

Screen Shot 2022-03-08 at 8 49 06 AM

Screen Shot 2022-03-08 at 8 49 09 AM

Screen Shot 2022-03-08 at 8 49 00 AM

Screen Shot 2022-03-08 at 8 49 02 AM

Screen Shot 2022-03-08 at 8 48 52 AM

Screen Shot 2022-03-08 at 8 48 55 AM

Screen Shot 2022-03-08 at 8 48 34 AM

Screen Shot 2022-03-08 at 8 48 28 AM

Finally, on this subject, it should be noted that when we conduct "light transport-like math" on RGB tristimulus, we are permanently baking in changes to the tristimulus that varies depending on what working RGB tristimulus model we are in. That means doing indirect bounces of identical tristimulus chromaticity will yield different results depending on what RGB model we are in.

Third Problem: "Gamut" Mapping

Even if we ignore the first two problems, we have a combination of the first problem present in our final output. If we are on a medium that cannot express the tristimulus value, such as going from BT.2020 to sRGB, how the heck do we formulate the result in the destination? A majority of approaches are just a simple clip, and that of course leads down the "hue" skew path of varying degrees, as well as making the result device dependent given that it will render differently on different output mediums.

Even if we aren't faced directly with the more abstract "meaningfulness" of nonsense values, we are again faced with the "meaning within the medium". For example, relative to sRGB, BT.2020 tristimulus mixtures may have meaning within the gamut, and no meaning for values that cannot be expressed. How do we give those values meaning? Do we focus on tonality so the plastic orange chair isn't a big huge flat wash of nasty? Do we try to decrease the perceptual facets of "chroma" or "hue" on our orange chair? What impact does that have when we go to the shop to buy the orange chair and it's... not the orange we thought we saw? These are pretty gnarly complexities that currently have no real answers without folks you to engineer them.

Sorry for the massive post, but I just wanted to get these things out into another public domain because there are so many misunderstandings and false assumptions behind much of this rendering stuff that I think many folks could contribute toward solving if greater awareness were out there.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lots of news to me here, thanks @sobotka and @romainguy! 🙏

One takeaway I think I'm hearing (and please correct me if this is wrong) ... even when viewing the rendered image on a display that supports wider P3, AP1, or Rec 2020 gamuts, scene lighting calculated in a wider gamut working color space is not "better" (in any clear and simple sense) than scene lighting done in Rec 709?

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

even when viewing the rendered image on a display that supports wider P3, AP1, or Rec 2020 gamuts, scene lighting calculated in a wider gamut working color space is not "better" (in any clear and simple sense) than scene lighting done in Rec 709?

Until we define what we are comparing on either side of the "better" the answer is yes. That is, in terms of image formation, having additional visual energy will lead to higher "colourfulness". Using pure primaries with exceptionally low luminance will potentially be detrimental here in terms of resulting image formed.

In the end, the analysis of "better" should be tempered against a clear declaration as to what is being compared, and why. It's an analysis of qualia, ultimately.

@rsahlin
Copy link

rsahlin commented Mar 10, 2022

Regarding comment from @MiiBond

I see this extension as something that a viewer or exporter would support but not something that an importer would probably read.

To me, this is a strong argument against having this extension (exposure + LUT) inside the glTF asset.
I believe it shall be applied as a last step in engines,viewers or renderers, just before 'consumption'.

Just like content is authored to different targets - movie, print or web.

@elalish
Copy link
Contributor Author

elalish commented Mar 11, 2022

This is very different from khr_displaymapping_pq and "the 3D Commerce usecase" where the whole purpose is to provide a 'neutral' mapping of content to the output.

If color grading (exposure factor and LUT) are to be applied, that is done as a final step when content is authored to the target. The same content will have different grading (exposure factor and LUT) depending on if the destination is a movie theater, DVD, broadcast, computer game or Web-viewer. Introducing grading into the asset will lead to conficting intent, for instance when mixing assets and/or when viewing on a destination that has different grading.

In my opinion color grading (exposure factor and LUT) does not belong in a glTF asset, if needed it should be in the device that consumes the content (ie knows about the output characteristics)

@rsahlin Agreed we ideally need different LUTs for different destinations. Happy to include that here or in a follow-on extension. As for not belonging in the asset, I agreed in the case of an IBL extension for the same reason, but especially because of the huge over-the-wire cost that many consumers wouldn't want. This is at least compact, but yes. However, I would argue khr_displaymapping_pq has exactly the same problem: it too specifies a LUT, just a very simple one. Again, anyone who wants to post-process differently (including a metaverse/AR app) will have to ignore that extension.

@rsahlin
Copy link

rsahlin commented Mar 11, 2022

Agreed we ideally need different LUTs for different destinations.

In my opinion they belong to the destination (engine) and not inside the asset.
See them as a type of filter that is applied (by the application) just before consumption.
Similar to how the Instagram app adds filters.
Just like those filters are 'in' the Instagram app - the LUTs in this case shall be in the engine/renderer.

As for not belonging in the asset, I agreed in the case of an IBL extension for the same reason

I don't think a LUT is nowhere near like an environment light. For the following reasons:

  • A LUT may be non physical - changing colors just like a filter would.
  • A LUT will be fixed to one light setup. When the light changes the scene is likely to 'break'
  • A LUT is not part of what I call a 3D scene. I cannot see a correlation to what we are defining with the glTF dataformat.
    Namely geometry, surface properties and lightsources.
  • A LUT may make viewing of models from different scenes impossible while retaining the creators intension (via the LUT)
  • A LUT may introduce errors in the 3D content pipeline.
    If the color of a model appears to be wrong, then there is nothing stopping somebody from changing the LUT to get the desired color.
    The apropriate behavior would be to change surface properties (basecolor).
    This will happen when glTFs are handle in large amounts over time.

I argue that environment light does not have any of the above problems.

I strongly believe that the way to provide a studio type of 'look & feel' on a glTF is by using the light setup (punctual and environment lights) - not by adding a filter.

@sobotka
Copy link

sobotka commented Mar 17, 2022

I strongly believe that the way to provide a studio type of 'look & feel' on a glTF is by using the light setup (punctual and environment lights) - not by adding a filter.

You, again, are completely ignoring the role the image formation process plays in this.

There can be no moving forward until you come around to appreciating that one cannot, even if desired, output the “stimulus”.

This whole idea is nonsense.

@hybridherbst
Copy link

hybridherbst commented Mar 17, 2022

As an idea to find a common ground here:

  • I think the main point that @rsahlin wants to reach is "reproducible and consistent output"
  • I think the main point that @elalish and @romainguy want to reach is avoiding an implicit incorrect LUT and implicit exposure, and instead specify exact values in the asset.

I propose:

  • instead of only specifying an exposure value, introducing an exposure hint that can have one of three modes:
    • "automatic"
    • "fixed"
    • "maxBrightness" (needs a better name - calculates exposure from scene lights similar to what KHR_displaymapping_pq is aiming to do)
  • dropping the user-specified LUT either entirely or replacing that with a mode:
    • "none" (current behaviour)
    • "ACES" (or another name - "neutral"?, not sure @romainguy how your proposal of a neutral LUT would be named)

This way, we avoid the LUT discussion (and don't introduce too much power into the asset), and the following combinations would be possible:

Current behaviour:

"tonemapping": "none",
"exposureBehaviour": "fixed",
"exposure": "0"

Automatic exposure like many game engines do:

"tonemapping": "ACES",
"exposureBehaviour": "automatic",
"exposure": "0"

Authored fixed exposure for a daylight scene:

"tonemapping": "ACES",
"exposureBehaviour": "fixed",
"exposure": "15.0"

Similar to what KHR_displaymapping_pq would do:

"tonemapping": "ACES",
"exposureBehaviour": "maxBrightness",
"exposure": "0"

When "exposureBehaviour" is either "automatic" or "maxBrightness", "exposure" serves as an offset, allowing artistic control to nudge into a "high-key" or "low-key" look and feel.

These would all be per-camera and/or per-scene "renderer hints". Viewers could aim to match the closest camera's rendering hints if free navigation is allowed. When multiple assets are combined that all have maxBrightness mode, the behaviour is still deterministic and very clear.

(I'm not sure how far along the IBL extension proposals are; optionally this extension here could also have something like "externalLight": true / false to specify if a viewer should apply any kind of IBL or rely on lighting and/or IBL as specified in the file instead - without that, no consistent output between viewers is possible.)

@donmccurdy
Copy link
Contributor

I believe it would be a mistake to reduce the nonlinear portion of the mapping function to a choice of 1-2 presets, while still positioning this as a general-purpose image formation extension. If this were a proposal to define lighting and image formation scenarios specifically relevant to retail products displayed in typical stores (say, 3DC_lights_retail and 3DC_image_formation_retail?), then perhaps presets would be fine.

But for an extension meant for use in the wider glTF ecosystem, the intent of KHR_displaymapping_pq is too prescriptive, setting aside the technical objections. I do not think we should pull KHR_image_formation in that direction.

Please don't view the possibility that an artist might create a stylized LUT (rather than a "neutral" one) as a problem to be solved — it is a healthy side effect of providing the proper tools for image formation. If general consensus is that image formation definitions do not belong in a glTF file, I'm OK with that decision. But imposing more prescriptive choices on the extension will not help.

@hybridherbst
Copy link

hybridherbst commented Mar 17, 2022

I had proposed earlier that a better name might be "KHR_rendering_hints" or something similar that makes it very clear that these are hints for viewers on how something should be displayed.

I don't have a strong opinion on including or not including user LUTs; the above was an attempt to find a useful middle ground for an extension that allows for both flexibility and the goal of getting closer to determinism.

One could also argue that one extension (e.g. KHR_rendering_hints) would be about the above (exposure, perceived brightness, a good neutral mapping for HDR values) and another extension (e.g. KHR_tonemapping) would be able to override the "neutral" mode and extend it with user LUTs. I'm not sure what the current approach is to either put multiple features into one extension or to split things up (the transmission/volume/IOR triplet seems to suggest the latter).

@donmccurdy
Copy link
Contributor

On optional hints —

Thanks @hybridherbst, and yes! I'm comfortable with the idea that the extension should generally be optional (formally: in "extensionsUsed" but not "extensionsRequired"). It would — eventually — be worthwhile to give examples of when we'd encourage or require a client to ignore an optional KHR_image_formation extension. Similar to how a pathtracer might discard thicknessMap rather than overriding ground truth. I probably wouldn't go as far as renaming the extension to clarify that: optional/required is a shared concern in many extensions, and KHR_materials_volume_hints is pretty verbose.


On scope —

More generally I am a bit concerned about the arc of the KHR_image_formation / KHR_displaymapping_pq discussion. We are arguing over how many legs a tripod should have. KHR_displaymapping_pq offers a tripod with one leg1, not providing the mechanism to achieve its own goal outside narrow contexts. KHR_image_formation offers a tripod with three legs, and does the job with a practical level of simplicity and flexibility. If we're going to build a tripod, we can alter the three legs (parameterize as .cube LUTs, fixed properties, .mtlx nodes, etc...), but let's not compromise on fewer than three legs. 🙏🏻

1Apologies, I know my metaphor seems to imply that this is simple and obvious — these are very complicated topics, and nothing here is obvious. But obvious or not, we need an extension that solves the problem at hand.

@hybridherbst
Copy link

I agree with the above, especially the point about examples - but I also do understand Rickards concerns that this proposal currently doesn't solve the challenge he's facing - deterministic output of light values based on scene content or the combination of scenes.

Introducing "exposureBehaviour" and "exposure" as outlined would allow for these cases in my mind (files that just scale exposure based on lights in the scene and have a kind-of-deterministic view-indepdent result). What do you think about these?

@UX3D-nopper
Copy link
Contributor

UX3D-nopper commented May 3, 2022

When we were doing the PBR materials, we looked at the major game engines and evaluated what is the common denominator.
When we were doing the PBR lights,, we looked at the major game engines and evaluated what is the common denominator.

Goal was the maximum compatibility and acceptance for glTF and the extension. Furthermore, we relied on the research and scientific papers of these companies.

I suggest, we do the same for the "last pixel mile" - exposure and tonemapping.

So, if you look at the Unreal Engine and Unity:

Unreal Engine
https://docs.unrealengine.com/5.0/en-US/auto-exposure-in-unreal-engine/
https://docs.unrealengine.com/5.0/en-US/color-grading-and-the-filmic-tonemapper-in-unreal-engine/
Unity
https://docs.unity3d.com/Packages/com.unity.render-pipelines.high-definition@7.1/manual/Override-Exposure.html
https://docs.unity3d.com/Packages/com.unity.render-pipelines.universal@7.1/manual/post-processing-tonemapping.html

Both can do manual exposure and do have the ACES tonemapping. Unity has the option for no (clipped) tonemapping. In the Unreal Enigne, it is possible to disable tonemapping as well:
https://soldirix.wordpress.com/2020/05/26/ue4-how-to-disable-the-tone-mapper-once-and-for-all/

So, I suggest for this extension, to only have exposure and two well defined tonemappings:

  • None
  • ACES

For the LUT, I recommend to postpone it and move it to a follow up extension.

Last, but not least, having exposure = 0.0001 and tonemapping = None is the KHR_displaymapping_pq use case.

Finally, the transfer function relies on the used output colors space and render buffer and is already well defined:
https://www.khronos.org/registry/DataFormat/specs/1.3/dataformat.1.3.html#TRANSFER_CONVERSION

@sobotka
Copy link

sobotka commented May 3, 2022

Except this loops right back to both cases being garbage options.

@UX3D-nopper
Copy link
Contributor

Except this loops right back to both cases being garbage options.

When we were doing PBR next, we splitted the "beast" into several small extensions. Furthermore, we dropped and/or created new extensions in favour of compatibility and acceptance.

@elalish
Copy link
Contributor Author

elalish commented May 3, 2022

Compatibility and acceptance that defy the whole point of doing something in the first place is the hobgoblin of little minds.

@sobotka I really appreciate the detail and examples you've given above. You're right that this is a very complicated space. I doubt it's feasible to "solve" the perception problem, but the idea is to at least create consistency in output where feasible. Currently this last mile is simply unspecified; the question is can we do better than that and then build on it? Do you have any particular recommendations?

@UX3D-nopper
Copy link
Contributor

UX3D-nopper commented May 3, 2022

I think providing exposure = 0.0001 and tonemapping = None among the options (per the goals of KHR_displaymapping_pq) is reasonable, despite drawbacks of those choices for PBR rendering.

These values are the use case of the mentioned extension. Of course these values need to be adapted for very dark or very bright scenes.

I do wonder if we could go further in terms of baking — e.g. reducing the LUTs to 1D and 3D KTX2 textures with appropriate internal formats and zstd compression? But reducing this to just {ACES | None} goes too far in my opinion.

There are use cases e.g. embedded devices where another texture unit is a pain.
If you can convince Epic Games to implement these LUT's, then I am 100% with you. Otherwise we have to wait and one of the major engines out there is not compatible if we force LUT's today.

@sobotka
Copy link

sobotka commented May 3, 2022

Currently this last mile is simply unspecified; the question is can we do better than that and then build on it? Do you have any particular recommendations?

I think what is being tackled by this exact attempt is wise and prudent; think about what is seen by the audience.

In this case, and specifically in the case of E-commerce, which I believe is a sizeable portion of the problem surface, the quibbles over what is being discussed are actually sizable. Should the reddish chair look be distorted to pure yellow? Should the blue toy bunny be represented as a totally distorted purple? These sorts of things likely matter significantly in the context.

What I am cautioning is precisely what @donmccurdy more or less stated:

But reducing this to just {ACES | None} goes too far in my opinion.

And better than broken rendering for games, less E-Commerce, will only benefit.

@donmccurdy
Copy link
Contributor

@UX3D-nopper is there a canonical definition of ACES Filmic tone mapping somewhere? I assume we are just talking about a tone map, and not other components of ACES.

@UX3D-nopper
Copy link
Contributor

Most people probably copy & paste from here:
https://github.com/selfshadow/ltc_code/blob/master/webgl/shaders/ltc/ltc_blit.fs
or here:
https://github.com/TheRealMJP/BakingLab/blob/master/BakingLab/ACES.hlsl

To formalize, one has to dig into the original repository:
https://github.com/ampas/aces-dev
They use the Color Transformation Language (CTL)
https://github.com/ampas/CTL/

Basically, we need to define, how to transfer from SRGB (or another color space see https://nick-shaw.github.io/cinematiccolor/common-rgb-color-spaces.html#x34-1330004.1) to the AP1 color space, as the ACES tone mapping happens in AP1 (the RTT and ODT part). Then "go back" to SRGB (or the color space of the output buffer). One important thingy is, that AP1 has a different illuminant than SRGB, so a conversion has to happen in many cases as well:

image

From my perspective, converting between color spaces is already defined at many places e.g. at Khronos:
https://www.khronos.org/registry/DataFormat/specs/1.3/dataformat.1.3.html#_bt_709_bt_2020_primary_conversion_example

The only missing part is RTT and ODT:
https://github.com/ampas/aces-dev/blob/dev/transforms/ctl/lib/ACESlib.RRT_Common.ctl#L23
https://github.com/ampas/aces-dev/blob/dev/transforms/ctl/lib/ACESlib.ODT_Common.ctl#L32

On a first sight, it look likes one has to use the whole ACES dev package. But this is not the case.
The question is, how much we need to explain RTT and ODT as it is basically "just" another tonemapping curve.

@romainguy
Copy link

The definition you linked to are approximations and not the actual definition. The BakingLab one is a good approximation that's good for real time renderers. I've extracted a less-approximated* definition in Filament though: https://github.com/google/filament/blob/ebd5f150c16548b83bbb4a4fec9e4430c2fa1309/filament/src/ToneMapper.cpp#L28

  • Less approximated because the RRT+ODT use a curve fit, just like Unity HDRP does I think. But it preserves the rest of the tweaks defined by ACES

@UX3D-nopper
Copy link
Contributor

I could reproduce the matrices in the links using CTL from ACES and for me it does not look like an approximation.
Do you have a link of the "exact" chain from ACES?
I implemented a subset which looks like this:
image
In this approach, the nits of the output display are later taken into account.

@romainguy
Copy link

The matrices are not approximations, the rest very much is.

@sobotka
Copy link

sobotka commented May 4, 2022

In this approach, the nits of the output display are later taken into account.

Very much no, but a much larger discussion.

Folks should be paying attention to what is seen, not an obfuscated chain that ultimately amounts to nonsense.

@UX3D-nopper
Copy link
Contributor

In this approach, the nits of the output display are later taken into account.

Very much no, but a much larger discussion.

Folks should be paying attention to what is seen, not an obfuscated chain that ultimately amounts to nonsense.

The source code is here:
https://github.com/ux3d/KhronosSandbox/blob/master/Resources/shaders/tm_tf.frag
https://github.com/ux3d/KhronosSandbox/tree/master/Example16/src

@sobotka Please guide me, where the code could be improved or where the wrong assumption is.

@UX3D-nopper
Copy link
Contributor

The matrices are not approximations, the rest very much is.

Phew, I compared it with the Epic Games version:
https://github.com/EpicGames/UnrealEngine/blob/release/Engine/Shaders/Private/ACES.ush
Also this site explains it really nice:
https://chrisbrejon.com/cg-cinematography/chapter-1-5-academy-color-encoding-system-aces/

@romainguy Thx for the hint, will look for the paper/presentation, how Stephen Hill explains his changes.

Anyway, this is no good news for consistency without changes, as I assume some are implementing the exact ACES and some the approximated one. Furthermore, there is another one from Narkowicz

@romainguy
Copy link

romainguy commented May 4, 2022

The one from Narkowicz is even more approximated, as it's a simple curve that remains in sRGB, and doesn't go through AP0/AP1. Its main drawback is that it loses some of the nice side effects of the ACES RRT/ODT, including some of the path to white. It's super cheap though. I even proposed an even cheaper approximation (that also combines the sRGB transfer function) in a SIGGRAPH talk: x / (x + 0.155) * 1.019.

Here's Narkowicz:
Screen Shot 2022-05-04 at 10 34 48 AM

vs ACES:
Screen Shot 2022-05-04 at 10 35 06 AM

But you can trick Narkowicz into behaving more like ACES by applying it in a wider gamut, like Rec.2020:
Screen Shot 2022-05-04 at 10 35 11 AM

(note the hue skews in all cases though… 👎)

@UX3D-nopper
Copy link
Contributor

Thanks a lot for the clarification. And I think I am convinced regarding the LUT approach:

It is cheap to calculate and easy to implement in the shader. And, it has 100% flexibility and is fast on embedded devices.

In the end - I think so - it is much easier to convince Epic Games or others to implement the LUT approach beside tweaking their shaders with specific formulas.

Maybe we can specify, that the LUTs are stored in a given color space e.g. Rec.709 (Rec.2020 and so on) as a parameter inside glTF. Then, one only has to do the color conversion forth and back.

Last but not least, in this extension, we should specifiy how to generate these LUTs and not just reference e.g. to Adobe.

@UX3D-nopper
Copy link
Contributor

UX3D-nopper commented May 4, 2022

Furthermore, Khronos could provide this LUT generator where folks can put in their custom tone mappings algorithms.

@sobotka
Copy link

sobotka commented May 4, 2022

At that point you are so far down the rabbit hole of complete garbage that maybe it makes sense to look at this proposal.

ACES is not a colour management system, and is about the worst idea one could think of adding given that it does not under any circumstances solve any real-world problems.

It does not:

  1. Manage either the stimulus of the original scene, nor a formed image.
  2. Manage the appearance of the stimulus of the original scene, nor formed imagery.

Both of those definitions meet the CIE definitions of "colour" if one visits the term list.

It's an overhead that will bring nothing to the table, and have a direct impact on E-Commerce, entertainment, etc.

@UX3D-nopper
Copy link
Contributor

At that point you are so far down the rabbit hole of complete garbage that maybe it makes sense to look at this proposal.

ACES is not a colour management system, and is about the worst idea one could think of adding given that it does not under any circumstances solve any real-world problems.

It does not:

  1. Manage either the stimulus of the original scene, nor a formed image.
  2. Manage the appearance of the stimulus of the original scene, nor formed imagery.

Both of those definitions meet the CIE definitions of "colour" if one visits the term list.

It's an overhead that will bring nothing to the table, and have a direct impact on E-Commerce, entertainment, etc.

Hmm, I just stepped away from ACES. If someone wants to use a LUT for an ACES tonemapping curve, one can do so.
If someone wants to have another curve, one can do so.

Does the LUT approach block your use case?

@elalish
Copy link
Contributor Author

elalish commented May 4, 2022

@sobotka Hey, really appreciate your knowledge here, but can you try to keep the tone a touch more helpful? Yelling that everything is terrible doesn't really give anyone direction. Can you focus on what would be good and useful instead of what is bad? You can start by telling us what you'd like to accomplish and what about a generic LUT would or would not fulfill that. And maybe contrast it with ACES, which none of us has attachment to other than the fact that it's commonly used.

@sobotka
Copy link

sobotka commented May 4, 2022

Apologies, I’m trying to be positive here. I see a reasonable offering here that is more or less rather simplistic, and then a sidetrack.

Solutionism without a declared and itemized problem seems challenging.

I have looked for what this particular piece of the puzzle attempts to solve, and who are the stakeholders with their needs, and cannot find a document anywhere?

@donmccurdy
Copy link
Contributor

donmccurdy commented May 4, 2022

I don't know that we've all agreed on a problem definition, but here's a strawman:

We would like to allow authors of glTF 2.0 3D scenes to embed sufficient information for viewers (e.g. 3D Engines) to produce a consistent image of the scene. Current tone-mapping implementations across engines are notably inconsistent. Ideally a solution would both support prevailing current practices1, and allow some flexibility for future improvements. Because glTF aims to be a runtime-friendly format, very large LUTs and dynamic compilation of arbitrary shaders would preferably be avoided in these view transforms.

More concretely, do we think that the proposed parameterization — exposure, 1D shaping lut, 3D LUT — is enough to compactly define:

  1. ACES Filmic
  2. Improved filmic view transforms2
  3. PQ OETF from KHR_displaymapping_pq
  4. sRGB OETF

I believe this proposal supports (3) and (4) trivially. I am not sure about (1) and (2).


1 For better or worse, prevailing practice appears to be sRGB OETF or ACES Filmic today.

2 Possibly Blender Filmic would be a good evaluation case?

@romainguy
Copy link

I believe it does cover 1 and 2 since we implement both 1 and 2 in Filament with a LUT :)

BTW, @sobotka is the author of Blender Filmic, I'm sure he can shed light on whether or not they are worth considering.

@UX3D-nopper
Copy link
Contributor

I don't know that we've all agreed on a problem definition, but here's a strawman:

We would like to allow authors of glTF 2.0 3D scenes to embed sufficient information for viewers (e.g. 3D Engines) to produce a consistent image of the scene. Current tone-mapping implementations across engines are notably inconsistent. Ideally a solution would both support prevailing current practices1, and allow some flexibility for future improvements. Because glTF aims to be a runtime-friendly format, very large LUTs and dynamic compilation of arbitrary shaders would preferably be avoided in these view transforms.

More concretely, do we think that the proposed parameterization — exposure, 1D shaping lut, 3D LUT — is enough to compactly define:

  1. ACES Filmic
  2. Improved filmic view transforms2
  3. PQ OETF from KHR_displaymapping_pq
  4. sRGB OETF

I believe this proposal supports (3) and (4) trivially. I am not sure about (1) and (2).

1 For better or worse, prevailing practice appears to be sRGB OETF or ACES Filmic today.
2 Possibly Blender Filmic would be a good evaluation case?

The LUT should define the tone mapping curve but not any transfer function as it depends on the used color space from the used output buffer:
https://www.khronos.org/registry/DataFormat/specs/1.3/dataformat.1.3.html#TRANSFER_CONVERSION
Basically, the glTF asset has to have the same (or at least similar) visual output independent if e.g. the expected output color space of the output buffer is linear or not linear. And there are quite a few possibilities out there:
https://vulkan.gpuinfo.org/listsurfaceformats.php
https://github.com/KhronosGroup/Vulkan-Headers/blob/main/include/vulkan/vulkan_core.h#L7294

@UX3D-nopper
Copy link
Contributor

UX3D-nopper commented May 5, 2022

@donmccurdy I tried out all combinations what my GPU and display is providing:
PXL_20220329_183503753

	Formats: count = 5
		SurfaceFormat[0]:
			format = FORMAT_B8G8R8A8_UNORM
			colorSpace = COLOR_SPACE_SRGB_NONLINEAR_KHR
		SurfaceFormat[1]:
			format = FORMAT_B8G8R8A8_SRGB
			colorSpace = COLOR_SPACE_SRGB_NONLINEAR_KHR
		SurfaceFormat[2]:
			format = FORMAT_R16G16B16A16_SFLOAT
			colorSpace = COLOR_SPACE_EXTENDED_SRGB_LINEAR_EXT
		SurfaceFormat[3]:
			format = FORMAT_A2B10G10R10_UNORM_PACK32
			colorSpace = COLOR_SPACE_HDR10_ST2084_EXT
		SurfaceFormat[4]:
			format = FORMAT_A2B10G10R10_UNORM_PACK32
			colorSpace = COLOR_SPACE_SRGB_NONLINEAR_KHR

@sobotka
Copy link

sobotka commented May 5, 2022

Basically, the glTF asset has to have the same (or at least similar) visual output independent

This is where things will go sideways as the entire subject is vastly deeper than it first seems.

It can help to break things down into the tristimulus data, and the resultant formed image. Per channel mechanics will form an image differently across differing mediums, depending on the working contexts.

It is also worthwhile to test the specifics of image formation chains for stability.

@donmccurdy
Copy link
Contributor

donmccurdy commented Sep 1, 2022

I spent some time looking into runtime-friendly compression for .cube LUTs, and am very happy with the results from KTX2. Results below from one of the Filmic Blender LUTs, compressed with KTX2 + ZSTD, and parsed in three.js:

format size parse time
.cube 7.414 MB 90 – 100 ms
.ktx2 94 KB 0.1 – 0.2 ms

More details in https://www.donmccurdy.com/2022/08/31/compressing-luts-with-ktx2/.

@UX3D-nopper
Copy link
Contributor

@elalish We should make sure, that this extension is a real subset of https://opencolorio.org/

@elalish
Copy link
Contributor Author

elalish commented Nov 23, 2022

@elalish We should make sure, that this extension is a real subset of https://opencolorio.org/

This sounds like it can produce what we want: https://opencolorio.readthedocs.io/en/latest/tutorials/baking_luts.html#shaper-spaces

Agreed we should test this flow as part of an implementation.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants