Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Clarify data model regarding binary data #942

Closed
wants to merge 2 commits into from
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
18 changes: 14 additions & 4 deletions jsonschema-core.xml
Original file line number Diff line number Diff line change
Expand Up @@ -198,13 +198,17 @@

<section title="Instance">
<t>
A JSON document to which a schema is applied is known as an "instance".
A document to which a schema is applied is known as an "instance".
</t>

<section title="Instance Data Model">
<t>
JSON Schema interprets documents according to a data model. A JSON value
interpreted according to this data model is called an "instance".
JSON Schema interprets documents according to a data model. A value
interpreted according to this data model is called an "instance". JSON
documents map trivially into the data model, with a few exceptions as
noted below. Documents of other media types MAY be treated as instances
if a suitable application-defined mapping of the media type into the
data model can be determined.
</t>
<t>
An instance has one of six primitive types, and a range of possible values
Expand All @@ -219,6 +223,12 @@
<t hangText="string:">A string of Unicode code points, from the JSON "string" value</t>
</list>
</t>
<t>
Binary data MAY be treated as an instance, however no data type in the data model
is suitable. Therefore, only schemas such as the empty schema that do not
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The added paragraph above this one says that it's possible there's some application-defined mapping from binary data to the JSON data model. The wording of this paragraph seems to waffle between "MAY be treated as an instance" and "can't do anything unless it's an empty schema". Maybe add some language that suggests that a schema can be applied if that data model is applied. For example (changes in bold), "Binary data MAY be treated as an instance, however no data type in the data model is directly suitable. Therefore, only schemas such as the empty schema that do not constrain the type can be considered to pass outright."

I'm sure there's some better language, but this paragraph feels like it's fighting with itself if there's no acknowledgement that there exists that "application-defined mapping" in the previous paragraph.

constrain the type can be considered to pass. Rationales for and behavior of
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think commas may be required: "Rationales for, and behavior of,"

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not in this instance.

binary data as an instance SHOULD be defined by the consuming application.
</t>
<t>
Whitespace and formatting concerns, including different lexical
representations of numbers that are equal within the data model, are thus
Expand Down Expand Up @@ -3796,7 +3806,7 @@ https://example.com/schemas/common#/$defs/count/minimum
<t></t>
<t></t>
<t></t>
<t></t>
<t>Clarify applicability to non-JSON media types</t>
<t></t>
<t></t>
<t></t>
Expand Down
37 changes: 27 additions & 10 deletions jsonschema-validation.xml
Original file line number Diff line number Diff line change
Expand Up @@ -913,7 +913,9 @@
<section title="Foreword">
<t>
Annotations defined in this section indicate that an instance contains
non-JSON data encoded in a JSON string.
non-JSON data encoded in a JSON string. Additionally, they can be used in
the context of resources of various media types to indicate binary resources
not otherwise describable by JSON Schema.
</t>
<t>
These properties provide additional information required to interpret JSON data
Expand Down Expand Up @@ -945,14 +947,21 @@
consumer than that which processed the containing document.
</t>
<t>
All keywords in this section apply only to strings, and have no
effect on other data types.
All keywords in this section generally apply to strings, and have no
effect on other JSON data types. Additionally, they MAY be used without
type information when describing resources of other media types, subject
to certain restrictions.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could one or two examples of restrictions be given here?

</t>
<t>
Implementations MAY offer the ability to decode, parse, and/or validate
the string contents automatically. However, it MUST NOT perform these
the string contents automatically. However, they MUST NOT perform these
operations by default, and MUST provide the validation result of each
string-encoded document separately from the enclosing document. This
string-encoded document separately from the enclosing document. In particular,
these keywords, including "contentSchema", MUST NOT cause the containing schema
to fail validation.
</t>
<t>
The optional automatic decoding, parsing, and validating
process SHOULD be equivalent to fully evaluating the instance against
the original schema, followed by using the annotations to decode, parse,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this say instead, "The optional automatic decoding, parsing, and validating process SHOULD be equivalent to fully evaluating the instance against process SHOULD be equivalent to fully evaluating an instance against a schema, followed by using the annotations to decode, parse the original schema". The thing is, this section is about applying schemas or rules to encoded "content". if we use "evaluating the instance against the original schema", then it feels like there's confusion between the schema containing all this stuff vs. the stuff inside "content". Does this make sense?

and/or validate each string-encoded document.
Expand Down Expand Up @@ -1005,7 +1014,14 @@
<t>
If the instance is a string, this property indicates the media type
of the contents of the string. If "contentEncoding" is present,
this property describes the decoded string.
this property describes the decoded string. If the "type" keyword is
absent, this keyword MAY be interpreted as describing an unencoded binary
resource. The exact meaning and behavior of this untyped usage is
application-defined.
Comment on lines +1017 to +1020
Copy link
Member

@awwright awwright Jun 28, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is effectively permitting values outside the data model, which sort of defeats the point of having a data model, doesn't it?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@awwright I like your idea of simply extending the data model. I'd probably recommend that we just add binary rather than throwing it entirely open, but I could be persuaded on that point.

What's going on here is that I'm recognizing a thing that a significant number of people (OpenAPI users) were already doing in the wild, in a way that was more problematic. OAS 3.0 treats binary data as "type": "string" plus a custom format, which is definitely wrong. Unencoded binary data cannot be considered a JSON string in any reasonable universe.

Since OAS 3.1 is picking up the content* keywords, they were willing to drop the custom format and use of "type": "string" as long as there was something that they could recommend for this use case. This is a scenario where nothing's going to be validated, but a JSON Schema is used for descriptive purposes. Which is fine with content* because they don't do validation anyway. They are annotations, so like all annotations, their behavior is application-defined. This is particularly unusual behavior so it is called out. Also, we want it called out so people don't come up with worse "solutions" again.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would argue, let's just call binary for now, and if other use cases present, then open it up later.
I'd like to avoid speculation and stay grounded on use cases right infront of our faces first.
=]

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Relequestual @awwright just to clarify, we're now proposing to add binary as a type?

I feel like this should get an issue for visibility so that others can comment on it. It's a major change from what I worked out with OAS. I'm reasonably OK with making the change, but it's kind of a big deal.

There are also alternatives such as a "binary": true approach. This would avoid breaking compatibility with anything that hardcodes the current set of valid type values.

I will file an issue for this.

<cref>
For an example of application-defined untyped usage,
see the forthcoming OpenAPI Specification v3.1
handrews marked this conversation as resolved.
Show resolved Hide resolved
</cref>
</t>
<t>
The value of this property MUST be a string, which MUST be a media type,
Expand All @@ -1015,8 +1031,9 @@

<section title="contentSchema">
<t>
If the instance is a string, and if "contentMediaType" is present, this
property contains a schema which describes the structure of the string.
If the instance is a string or an untyped binary resource,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about: "If the instance is a string or an untyped binary resource having an application-defined mapping to the data model"

and if "contentMediaType" is present, this property contains a schema
which describes the structure of the string.
</t>
<t>
This keyword MAY be used with any media type that can be mapped into
Expand Down Expand Up @@ -1433,8 +1450,8 @@
<list style="symbols">
<t>Correct email format RFC reference to 5321 instead of 5322</t>
<t>Clarified the set and meaning of "contentEncoding" values</t>
<t></t>
<t></t>
<t>Clarified value requirements for "contentSchema"</t>
<t>Expanded "contentMediaType" to unencoded binary media</t>
<t></t>
<t></t>
<t></t>
Expand Down