Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Extending representations to parameters and headers #762

Closed
darrelmiller opened this issue Aug 18, 2016 · 22 comments
Closed

Extending representations to parameters and headers #762

darrelmiller opened this issue Aug 18, 2016 · 22 comments

Comments

@darrelmiller
Copy link
Member

Assuming the recent PR for representations becomes part of OpenAPI 3.0, I propose that we reuse the concept to allow defining complex types for both parameter values and header values.

However, the additional complexity of 'representations' should only be required for complex types and not primitive types. This means that both header and parameter objects should be defined as having [type | representations] for describing the value.

This proposal rolls back, the previous change in 3.0, that a schema property is required in parameters to describe primitive types.

By using representations to define complex types, we are able to identify the media type that is used for the purpose of serialization. This would help with issues like #401 #69 #222 and address #665.

Because representation objects have schemas we address #717 #652 #667

So a parameter can be a primitive value like this,

{
  "name": "token",
  "in": "header",
  "description": "token to be passed as a header",
  "required": true,
  "type": "string"
  }

Or a complex value,


{
  "name": "token",
  "in": "header",
  "description": "token to be passed as a header",
  "required": true,
  "representations" : {
      "text/csv" : {
         "schema": {
            "type": "array",
            "items": {
              "type": "integer",
              "format": "int64"
            }
      }
   }
}
}

For response headers we can also optionally use representations object to describe JSON based headers.

"headers" : {
  "bb-telemetry-data": {
          "description": "Client statistics ",
          "representations": {
               "application/json" : {
                        "schema": { ... },
                         "examples": [ ...] 
                     }
           }
   }
}

From an tooling implementers perspective, one implementation of the representations structure can now be reused for describing complex structures in request bodies, response bodies, parameters and headers. There are certain escaping rules that are different and so tooling will need to know where the complex type is being serialized to ensure that only valid characters are used in URLs and headers.

One challenge for implementers is that if an OpenAPI definition defines two potential representations for a URL parameter or Header value, then it would be necessary to "sniff" a HTTP message to determine which representation is being used. This might be simple when it comes to differentiating between JSON and XML but becomes more difficult when media types like text/plain and text/csv are used.

@DavidBiesack
Copy link

I can see where an Accept: header can act as a 'representations' selector for a response, but what is the selector for the representation for a request header? How are discriminators and other selectors specified, for example if the type is determined by a query parameter, or a field in the request body, or a URL path parameter, or a .json or .xml or .yaml extension, or....

@DavidBiesack
Copy link

Would OAS also have a "representations" section (sibling of "definitions") so that items in the representations map (or the entire map) could be $ref'd? (IIRC there is a proposal for a reusable "components" structure/mapping)

@darrelmiller
Copy link
Member Author

@DavidBiesack Where multiple representations are defined for inbound parameters, the server is responsible for identifying which representation the client used for a URL parameter or header parameter. This is going to require some kind of sniffing algorithm. It's not ideal, but I don't think there are many scenarios where multiple representations are needed for inbound complex typed parameters.

As far as response headers are concerned, there definitely is an issue on how a server should choose the appropriate representations. It might be reasonable to re-use the Accept header provided by the client, as the Accept header is not specifically a response content-type selector, but simply a declaration by the user-agent of what media types it supports. If a user-agent declares that it understands application/json and a response header has multiple representations, one of which being application/json, then the server should probably use json for the header.

@DavidBiesack
Copy link

I'm a it concerned about ambiguities - both the response body can have multiple representations, and each response header can have multiple representations. application/json is not a real content type; it is more of a format; for APIs like ours, there are many application/vnd.+json representations (the GitHub REST API is another example of using multiple application/vnd.+json types). Each value would needs its own selector/discriminator and Accept: would not work for all; I think we need a more explicit selector/discriminator for these "representations" elements (or a default for each; i.e. Accept request header is the default selector for a response body representation.)

@darrelmiller
Copy link
Member Author

I believe I understand your point about having explicit selectors for different uses of representations, but I think I believe that adding multiple of these selectors is going to add too much complexity. If reusing Accept for all the uses of representations is not an solution that people can swallow, then I think I'd rather introduce a singular "representation" object for headers and parameters, so there is no ambiguity.

@darrelmiller
Copy link
Member Author

@DavidBiesack Regarding the addition of a representations section to components, I think that might be a good idea. Although, I'm not sure how much more value it brings over being able to re-use responses and parameters, but there is not a whole lot of additional complexity by allowing it.

@ePaul
Copy link
Contributor

ePaul commented Aug 19, 2016

Hmm, so we consider a header/path/url parameter to be a miniature document, and use some content-type (which is not declared beside the document, just in the API definition) to determine which type of document it is, so we can then match it with some schema.
And we still need to refer to that content type's documentation (instead of something in the spec) to actually map the values to a JSON object (or something else).

@DavidBiesack
Copy link

Certainly most cases there is only one representation, so no selector is needed, so the majority of cases remain simple.

regarding components - I favor uniformity/consistency across the spec, so if we can reuse one mechanism for expressing reusable components and apply it everywhere, that also brings simplicity.

@darrelmiller
Copy link
Member Author

@ePaul The media type here is used simply to provide a "serialization strategy" as discussed in #665. The media type identifier and then optionally the schema should provide sufficient information to a client code generator to know how to deserialize the header/parameter value. I'm not sure I follow your concern about having to refer to the content type's documentation. We currently don't include all the words from RFC 7159 in the OpenAPI spec when we advertise a JSON payload.

And, I think it is important to remember that someone would only use a 'representations object' in a header or parameter if it is a non-primitive type. This means the additional complexity of the representation object should rarely be used.

@darrelmiller
Copy link
Member Author

@DavidBiesack I hear you with regard to uniformity and consistency, that's one of the reasons I'm suggesting reusing representations for complex header/parameter types. The downside of uniformity is redundancy and the eternal "which mechanism should I use to do X" questions that follow. Having only one way for someone to do something means that we all do it the same way. I honestly don't know what the right answer is.

@fehguy
Copy link
Contributor

fehguy commented Aug 19, 2016

@OAI/tdc let's get your input on this

@darrelmiller
Copy link
Member Author

Certain members of the @OAI/tdc have raised the concern that "representations" is a lot of characters to type. So, in the interests of saving fingers, we are considering other proposals for the name of this new object.
Suggestions so far include:

"content" : {
                     "application/json" : { ... },
                     "application/jxml" : { ... }
                  }

The downside to this suggestion is that it is not easy to distinguish between the container content object and a specific content object. This is really just a documentation issue though.
An alternative might be,

"contentTypes" : {
                     "application/json" : { ... },
                     "application/jxml" : { ... }
                  }

Unfortunately, that only saves us 3 characters.
Another alternative is,

"bodies" : {
                    "application/json" : { ... },
                     "application/jxml" : { ... }
                  }

This would work, but we would probably need to rename requestBody to avoid confusion. It's also a tad morbid.

"payloads" : {
                    "application/json" : { ... },
                     "application/jxml" : { ... }
                  }

This works but was not liked by those on the call.

@darrelmiller
Copy link
Member Author

To answer my own concern, maybe we can have a content object that contains content type objects. Does that sound reasonable?

@DavidBiesack
Copy link

"representational" got boiled down to just "RE" in "REST" so we could just use "re" : { ... } and to save the most keystrokes 😏

I prefer "content"alaContent-Type`. I think that is more accurate than "bodies" and "payloads" (which feel wrong to me for headers and parameters) and concise enough.

@ePaul
Copy link
Contributor

ePaul commented Aug 19, 2016

@darrelmiller

I'm not sure I follow your concern about having to refer to the content type's documentation.
We currently don't include all the words from RFC 7159 in the OpenAPI spec when we
advertise a JSON payload.

While we don't do that, the OpenAPI schema objects describe (by reference to the JSON schema specification) JSON values (objects/arrays/primitives), which have an obvious mapping from/to JSON payloads (that is specified in the JSON specification).

But there is no obvious mapping of text/csv documents to JSON values. The most generic one I can imagine would be as an array of objects (with primitive property values) – but this does only work if there is a header line with the column names, because JSON objects don't have any way of property ordering, so you can't express the column meanings in the schema.
Otherwise maybe an array of arrays (of primitives) would be possible.

I also don't see how most documents matching RFC 4280 would fit into a HTTP header. (Maybe the CRLFs for separating records would have to be percent-encoded?)

From your example, I guess you imagine a CSV document with just a single record (= line) and without column headers (i.e. text/csv;header=false), and have this single record represented as a JSON array (of integers, in this case). I don't know how any implementation would be able to guess that from the text/csv specification, though.
(Also, HTTP allows it to replace a comma separated list in a header value with multiple occurences of the same header, or the other way around. Is this still text/csv when we have this?

Token: 15
Token: 2
Token: 56

Similar problems appear for application/x-www-form-urlencoded (the media type registry refers to the HTML specification, which in turn refers to a section in the URL specification) – it encodes a list of name-value tuples (where name and value are character strings) into a byte string (or decodes them again). But a list of name-value tuples is not quite the same as a JSON object (for example, the list can contain duplicate names, and a JSON object could contain non-primitive property values).

multipart/form-data (which would certainly not used for headers, but maybe for the request body, similar to application/x-www-form-urlencoded, see some start of the discussion in #761) defines how a series of "form field values" (which are either plain text or file contents, possibly with a file name attached), each with a name – where names can be duplicated, again (and at least for file uploads, this is a common use case) – is represented in a message (for HTTP or MIME). Again, I don't see an completely obvious way of mapping this to a JSON object.

(This is not meant to be a rant, but to show why I feel there is something missing in the specification as currently proposed.)

@darrelmiller
Copy link
Member Author

@ePaul I completely agree there is a big chunk of hand waving going on between the media type and the JSON schema for non-JSON scenarios. In the TDC call today we decided that next week we are going to being to address how to clearly describe non-JSON content. When it comes to headers and parameters there is another challenge that you illustrated well. How, do we map media-types into the constraints of HTTP headers, and URL parameters? Will it be sufficient to simply say that certain characters are not allowed depending on the location in the message?

In order to fix the mapping between media types and schema, I believe we have three basic directions we could take:

  • Stick with our commitment to JSON schema as it is the 90% solution (today) and create special case mappings from JSON schema, to key value pair,s delimited lists, and whatever other common syntax shows up.
  • Define our own format independent data modelling syntax to be used by the schema property
  • Add support for other schema formats. e.g. XSD, ABNF, Relax-NG, JCR, protobuf-schema But make it clear that tooling may or not support all these schemas.

There are pros and cons to all of these approaches but I believe we need to make a clear decision on our future direction

@DavidBiesack
Copy link

Consider adding examples to content objects as per the Parameter Object samples.

@jharmn
Copy link
Contributor

jharmn commented Sep 30, 2016

Boiling this down a bit further, there are a few practicalities:

  • schema is only useful for JSON (and perhaps some primitive types/regex's), based on the current spec. This is already solving JSON Schema-based parameters as well as we can.
  • collectionFormat is in items right now, effectively inside schema.

Other than perhaps tweaking collectionFormat, I'm not sure we can address other parameter serialization unless we take on the issues identified in #764.

P.S. 👍 for content if went this route. Nice and terse, without overlapping too much with request handling terminology.

h4. collectionFormat tweak example

{
  "name": "tokens",
  "in": "query",
  "description": "tokens to be passed as a query parameter",
  "required": true,
  "collectionFormat": "csv" //schema would not be allowed in this case
}

@darrelmiller
Copy link
Member Author

darrelmiller commented Sep 30, 2016

@jharmn I think this approach of using JSON Schema to describe things that are not JSON only works if consider the JSON Schema as describing a generic model of properties, values, lists and maps. From there you need to have a serialization strategy per media type. As you point out, using standard JSON Schema tooling that expect JSON as a input are not going to work.

If OpenAPI choose to treat JSON Schema as the generic data modelling language, then it is going to have to describe the mappings to media types we support.

The alternative evil as described in #764 is we allow other schema languages to be used that already have mappings for serialization.

There is no avoiding pain. It just comes down to what kind of pain we like better.

@jharmn
Copy link
Contributor

jharmn commented Sep 30, 2016

IMO the inclusion of a mongrel JSON Schema has been a long standing problem (which I'm glad we are fixing). The reason is that, practically speaking, tooling providers want to use existing schema parser/validators. That wasn't fully possible before (due to a hacked JSON Schema), but tooling providers did it anyways.
The notion that anyone is going to write a XML parser/validator (or any other format) based on JSON Schema is hard for me to believe, especially in the historical context.
If we supported XSD, for instance, we could simply required that $ref be utilized (or some other linking syntax, maybe $schemaRef), so we don't have to have XSD/protobuf-schema/etc quoted into JSON Schema (which could get pretty gross).
P.S. This probably belongs in #764 as a revival comment.

@fehguy
Copy link
Contributor

fehguy commented Sep 30, 2016

@darrelmiller and @jharmn to look at overlap between merged templating PR to see how one of these can go away

@darrelmiller
Copy link
Member Author

Incorporated into V3

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants