Core Metadata: Language pass for binary table format #493

donmccurdy · 2021-10-06T21:55:59Z

This PR is branched from #490 (which removes the raster format); it may be better to review that PR first.

Changes:

Remove explicit definition of "encodings", referring only to "binary table format" and "JSON table format". I don't have a strong preference between the word "encoding" and "format", but didn't want to include both terms if we aren't defining a raster format here.
Small restructuring of header hierarchy
Language and clarity pass on binary data format, particularly language related to arrays.

Next PR will add language changes for JSON table format.

donmccurdy · 2021-10-06T21:56:19Z

/cc @ptrgags @lilleyse @javagl

donmccurdy · 2021-10-06T23:22:11Z

Two questions questions below, copied because these lines aren't part of the diff —

Implementation Note: Depending on the implementation and the chosen integer type, there may be some loss of precision in the normalized values. For example, if the implementation uses 32-bit floating point variables to represent the normalized value, there are only 23 bits in the mantissa. In this case, if the value to normalize is 32- or 64-bit, a number of lower bits will be truncated when normalized. Therefore, it is recommended that implementations use the highest precision floats available for representing the result.

I'm not sure that I understand the recommendation of "highest precision floats available". For a normalized UINT16 value, float32 (23-bit mantissa) should be sufficient. For a normalized UINT8 value, float16 (11-bit mantissa) should be sufficient. Does this note intend to suggest float64 in both cases, or could we rephrase this? For example:

normalized	denormalized
INT32, UINT32	FLOAT64
INT16, UINT16	FLOAT32
INT8, UINT8	FLOAT16

Floating point numbers must be representable as IEEE floating point.

Similarly this may need to be clarified, but I'd like to check the intended meaning first. JSON allows floating-point numbers of any precision, but obviously programming languages have limitations. Would this be a fair way of rephrasing the line?

Floating point numbers should be rounded to IEEE-754 single precision, or lower precision where acceptable.

Similarly – many (but not all) JSON parsers will round int64 and friends. Do we:

(a) Warn about the risk but allow it
(b) Disallow 64-bit numeric types in JSON
(c) Require string encodings of 64-bit numeric types in JSON

javagl · 2021-10-07T13:21:40Z

I'd have to re-read the "before" and "after" versions in parallel with the diffs (also that of the individual commits) in order to better understand the actual changes (but in doubt, I could focus on reading the resulting state and see whether something pops out).

I'm not sure that I understand the recommendation of "highest precision floats available".

A wild guess: This was probably phrased like this to be "on the safe side", as in ~"you cannot do anything wrong with double". If this is true, then it could make sense to use the table that you created, and say that computations SHOULD be done with at least these numbers of bits. (But... I know few languages that have float16 FWIW...)

Regarding the number encoding in JSON ... you know that there has been an extensive discussion about that at KhronosGroup/glTF#1574 (just as a pointer for ptrgags/lilleyse ). The relevant part of how this was resolved seems to be the statement that was added at https://github.com/KhronosGroup/glTF/pull/1997/files#diff-ca628c11679a8c4c0c4031440c7c1cda27c8896ebc87c3bd1e8640efc8942d99R424

Non-integer numbers SHOULD be written in a way that preserves original values when these numbers are read back, i.e. they SHOULD NOT be altered by JSON serialization / deserialization roundtrip.

_Edit:

Sorry, this referred to the non-integer numbers. For integer numbers, the spec refers to https://datatracker.ietf.org/doc/html/rfc8259#section-6 which points out possible issues with interoperabiltiy, and gives more details about the constraints.

As I said in the PR: It's difficult for a "file format spec (that is built on JSON)" to specify aspects of JSON itself. Explicit warnings about int64 values that cannot losslessly be represented with float64 may be fine. But beyond that, we can hardly dictate any serialization constraint for numbers in JSON, because most implementors will just use their JSON library of choice, and not be able to influence that.

donmccurdy · 2021-10-07T16:43:39Z

Thanks @javagl! I agree about not specifying aspects of JSON itself, and I don't think that strict requirements with regard to serialization or deserialization would improve this PR. Most of the 3D Tiles spec is JSON, so any such requirements should probably go in the base specification's JSON encoding section rather than here.

Proposed:

(1) ... it is recommended that implementations use the highest precision floats available ...

I suggest that we rephrase this in terms of appropriate precision instead of highest precision. There is no reason to use Float64 to store the result of a denormalized uint8.

(2a) Existing: Floating point numbers must be representable as IEEE floating point.
(2b) Proposed: Floating point numbers should be rounded to IEEE-754 single precision, or lower precision where acceptable.

Perhaps neither of these should be included? I don't really know what the first line intends to prevent. And the second is merely a suggestion for efficiency, which (a) might fit better in the base specification, and (b) is incompatible with 64-bit types here.

(3) 64-bit types in JSON encoding

I do think that if we are defining a JSON encoding that purports to support 64-bit types — which neither glTF nor the base 3D Tiles specification do — we must be clear about how that is supported, or that it isn't supported.

At this point, my own suggestion would be to prohibit 64-byte types in the JSON encoding. The only viable implementation on the web today would involve (de)serializing them as strings...

JSON.stringify(
  {value: 12345678912345678912n}, 
  (_, v) => typeof v === 'bigint' ? v.toString() : v
);

...and we probably don't want to add that complexity without need. We can consider loosening the restriction later if there's a need and a feasible way to implement it, but given the JSON encoding is intended for small amounts of human-readable data, it does not seem like 64-bit types are a critical use case for it.

lilleyse · 2021-10-07T17:34:59Z

(1) ... it is recommended that implementations use the highest precision floats available ...

Might need @ptrgags's insight here, but your suggestion seems good.

(2a) Existing: Floating point numbers must be representable as IEEE floating point.

Non-integer numbers SHOULD be written in a way that preserves original values when these numbers are read back, i.e. they SHOULD NOT be altered by JSON serialization / deserialization roundtrip.

I agree that we need language like this in the core 3D Tiles spec. Some examples that come to mind

The noData value in JSON should match the actual no-data values in binary exactly and not get altered by the roundtrip
The property's min/max should match the actual data's min/max exactly (for likely the same reasons as glTF)

That was the intent behind the language Floating point numbers must be representable as IEEE floating point. Any sort of rounding could affect comparisons.

Essentially I'm good with copying the glTF rules for JSON encoding and putting them in the JSON encoding section. That's what we originally did, glTF's rules just got more precise over time and the 3D Tiles spec never got updated 😄

(3) 64-bit types in JSON encoding

Not sure what to do here for 64-bit integers, but it goes beyond this spec (thinking back to noData/min/max in 3DTILES_metadata, exactness is important). I'm kind of ok with strings for values outside the safe range [-(2**53)+1, (2**53)-1] but yeah, that's kind of messy.

ptrgags

@donmccurdy almost there, just a couple small tweaks.

specification/Metadata/README.md

ptrgags · 2021-10-12T19:27:42Z

Oh and make sure to resolve the merge conflict

ptrgags · 2021-10-12T19:38:36Z

oh to answer the questions above:

I suggest that we rephrase this in terms of appropriate precision instead of highest precision. There is no reason to use Float64 to store the result of a denormalized uint8.

Not sure what to do here for 64-bit integers, but it goes beyond this spec (thinking back to noData/min/max in 3DTILES_metadata, exactness is important). I'm kind of ok with strings for values outside the safe range [-(253)+1, (253)-1] but yeah, that's kind of messy.

yeah that's tricky... I would still allow 64-bit integer types anywhere, just maybe put an implementation note that JSON doesn't support values outside the safe range.

e.g. we wouldn't want to rule out a noData: -1 for a 64-bit type just because extremely large values aren't exactly representable.

Yeah "appropriate" precision makes sense here.

(2c) Existing: Floating point numbers must be representable as IEEE-754 floating point.

keep the existing wording by @lilleyse's reasoning. Though do mention that it's IEEE-754 to be more specific.

donmccurdy · 2021-10-12T21:29:09Z

yeah that's tricky... I would still allow 64-bit integer types anywhere, just maybe put an implementation note that JSON doesn't support values outside the safe range.

Didn't quite follow your meaning in "JSON doesn't support values outside the safe range..." ... I don't think the JSON format has any such limitation, even though most implementations probably do. So you're suggesting an implementation note warning that many implementations will not support values outside [define safe range here]?

And if so, do we mean the safe range of IEEE-754 double precision float?

2c) Existing: Floating point numbers must be representable as IEEE-754 floating point.

keep the existing wording by @lilleyse's reasoning. Though do mention that it's IEEE-754 to be more specific.

With "must" we're making a strong requirement, and I don't think this wording is enough – IEEE-754 includes up to octuple-precision float256, additionally. How about:

Floating point numbers SHOULD be representable as double precision IEEE-754 floats when encoded in JSON. When those numbers represent property values (such as noData, min, or max) having lower precision (e.g. single-precision float or 8-bit or 16-bit integer), the values SHOULD be rounded to the same precision in JSON to avoid any potential boundary mismatches.

(SHOULD italicized and capitalized for emphasis in this comment only)

ptrgags · 2021-10-14T15:53:29Z

@donmccurdy ah you're right, it's not the JSON but the implementations (JS in particular) that have issues. But yes, I think we should leave this as an implementation note for now.

And yes safe range for a 64-bit int within a IEEE-754 double-precision number.

The thing with "should" is does that imply that the number could be encoded in some non-IEEE-754 format? e.g. a fixed-point float? we don't want that. If all the numbers fit in a IEEE-754 float64, then they can fit in a float128 or larger with no issue (as you're just adding bits to the mantissa/exponent)

donmccurdy · 2021-10-14T17:55:28Z

But yes, I think we should leave [issues with values outside double precision IEEE-754 safe range] as an implementation note for now.

@ptrgags sounds good to me, OK with something like this?

Implementation note: Developers of authoring tools should be aware that many JSON implementations support only numeric values that can be represented as IEEE-754 double precision floating point numbers. Floating point numbers SHOULD be representable as double precision IEEE-754 floats when encoded in JSON. When those numbers represent property values (such as noData, min, or max) having lower precision (e.g. single-precision float or 8-bit or 16-bit integer), the values SHOULD be rounded to the same precision in JSON to avoid any potential boundary mismatches.

The thing with "should" is does that imply that the number could be encoded in some non-IEEE-754 format?

I think that's the situation, though — "should" might be the best we can do here. We're supporting int64 and uint64, and those values in JSON encoding (or their min/max in schema for binary encoding) could be outside of the IEEE-754 double precision safe range. Similarly, the glTF specification only uses "should", perhaps partly to require JSON encoders to do things that the JSON spec does not itself require.

ptrgags · 2021-10-14T18:24:30Z

@donmccurdy okay, let's go with this new wording

Core Metadata: Hierarchy and table format language pass. Core Metadata: Clean up. Update specification/Metadata/README.md Co-authored-by: Peter Gagliardi <ptrgags@gmail.com>

ptrgags · 2021-10-14T19:32:26Z

The updates look good, thanks @donmccurdy!

ptrgags changed the base branch from 3d-tiles-next-rev to 3d-tiles-next October 11, 2021 18:06

ptrgags reviewed Oct 12, 2021

View reviewed changes

specification/Metadata/README.md Outdated Show resolved Hide resolved

specification/Metadata/README.md Outdated Show resolved Hide resolved

donmccurdy force-pushed the core-metadata-v2.3 branch from 292f869 to 04f1b93 Compare October 14, 2021 19:15

Core Metadata: Remove raster format.

d30ff55

Core Metadata: Hierarchy and table format language pass. Core Metadata: Clean up. Update specification/Metadata/README.md Co-authored-by: Peter Gagliardi <ptrgags@gmail.com>

donmccurdy force-pushed the core-metadata-v2.3 branch from 04f1b93 to d30ff55 Compare October 14, 2021 19:23

ptrgags merged commit aa1e4d9 into CesiumGS:3d-tiles-next Oct 14, 2021

donmccurdy deleted the core-metadata-v2.3 branch October 14, 2021 21:00

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Core Metadata: Language pass for binary table format #493

Core Metadata: Language pass for binary table format #493

donmccurdy commented Oct 6, 2021

donmccurdy commented Oct 6, 2021

donmccurdy commented Oct 6, 2021 •

edited

Loading

javagl commented Oct 7, 2021 •

edited

Loading

donmccurdy commented Oct 7, 2021 •

edited

Loading

lilleyse commented Oct 7, 2021

ptrgags left a comment

ptrgags commented Oct 12, 2021

ptrgags commented Oct 12, 2021

donmccurdy commented Oct 12, 2021

ptrgags commented Oct 14, 2021

donmccurdy commented Oct 14, 2021 •

edited

Loading

ptrgags commented Oct 14, 2021

ptrgags commented Oct 14, 2021

Core Metadata: Language pass for binary table format #493

Core Metadata: Language pass for binary table format #493

Conversation

donmccurdy commented Oct 6, 2021

donmccurdy commented Oct 6, 2021

donmccurdy commented Oct 6, 2021 • edited Loading

javagl commented Oct 7, 2021 • edited Loading

donmccurdy commented Oct 7, 2021 • edited Loading

lilleyse commented Oct 7, 2021

ptrgags left a comment

Choose a reason for hiding this comment

ptrgags commented Oct 12, 2021

ptrgags commented Oct 12, 2021

donmccurdy commented Oct 12, 2021

ptrgags commented Oct 14, 2021

donmccurdy commented Oct 14, 2021 • edited Loading

ptrgags commented Oct 14, 2021

ptrgags commented Oct 14, 2021

donmccurdy commented Oct 6, 2021 •

edited

Loading

javagl commented Oct 7, 2021 •

edited

Loading

donmccurdy commented Oct 7, 2021 •

edited

Loading

donmccurdy commented Oct 14, 2021 •

edited

Loading