Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

allow more complex conditional branching for individual keys #34

Closed
mbroadst opened this issue Jun 3, 2016 · 18 comments · Fixed by annuaire-entreprises-data-gouv-fr/search-api#35

Comments

@mbroadst
Copy link

mbroadst commented Jun 3, 2016

Sorry for the somewhat verbose title there, but this is a proposal to discuss addition of a feature similar to Joi's when.

Joi lets you do something like:

{
  query: Joi.object({
    type: Joi.string().required(),
    value: Joi.string().max(255).when('type', {
      is: 'optionalValue',
      then: Joi.optional(),
      otherwise: Joi.required()
    })
  });
}

This is a bit of a contrived (and simple) example, but I think it shows how easy the mental mapping is versus json-schema. json-schema supports the same functionality but very quickly becomes difficult to define, needing to mix and match dependencies and in the worst case having to specify duplicate schemas with a single key's definition changed and jamming them into an anyOf or oneOf. What's worse is that when you have to use the aforementioned anyOf approach you generally end up with resultant errors which are incredibly difficult to reason about, when all you really need is a single error that a given field doesn't match the criteria (not e.g. 3 errors: one indicating that the field was incorrect according to one schema, another indicating the error is incorrect in the other possible schema, and finally that neither of the schemas provided in anyOf were matched).

Is there any room for discussion on adding these types of features to later versions of json-schema?

@mbroadst
Copy link
Author

mbroadst commented Jun 3, 2016

Just to give a little more clarity, here's a more involved example. The following is a schema for a network interface where interface and dhcp are required, but if dhcp is set to true then we also require ip, mask, and gw:

{
  type: 'object',
  anyOf: [
    {
      properties: {
        interface: { type: 'string' },
        dhcp: { constant: false },
        ip: { type: 'string', format: 'ipv4' },
        mask: { type: 'string', format: 'ipv4' },
        gw: { type: 'string', format: 'ipv4' }
      },
      required: [ 'hostid', 'interface', 'dhcp', 'ip', 'mask', 'gw']
    }, {
      properties: {
        interface: { type: 'string' },
        dhcp: { constant: true },
        ip: { anyOf: [ {type: 'string', format: 'ipv4'}, { type: 'null' } ] },
        mask: { anyOf: [ {type: 'string', format: 'ipv4'}, { type: 'null' } ] },
        gw: { anyOf: [ {type: 'string', format: 'ipv4'}, { type: 'null' } ] }
      },
      required: [ 'hostid', 'interface', 'dhcp' ]
    }
  ]
}

when with joi I could do something like:

{
  interface: Joi.string().required(),
  dhcp: Joi.boolean().required(),
  ip: Joi.string().ip().when('dhcp', { is: false, then: Joi.required() }),
  mask: Joi.string().ip().when('dhcp', { is: false, then: Joi.required() }),
  gw: Joi.string().ip().when('dhcp', { is: false, then: Joi.required() })
}

which is not only more compact, but I feel is also very fluent compared to the json-schema version

@mbroadst
Copy link
Author

mbroadst commented Jun 3, 2016

ah, looks like we might be able to tackle this already with switch (https://github.com/json-schema/json-schema/wiki/switch-(v5-proposal))

@awwright
Copy link
Member

awwright commented Jun 4, 2016

@mbroadst Can you compare and contrast this issue to my earlier #31?

This isn't a terrible example, but these sorts of complex rules tend to be overthought. In particular, I would suggest this may be a bit contrived, and I wouldn't use JSON Schema to specify "if DHCP is false, then IP is required": Consider what happens if there is no DHCP server, it's the exact same failure condition (except for the fact we can prove the failure ahead of time, ignoring link-local and stateless-autoconfiguration addresses.) Generally, JSON Schema is not intended to be the end-all-be-all of what data is acceptable and what isn't, especially things like relational data/foreign key constraints (though there might be a way to do that, but I don't think it would be appropriate for the JSON Schema validation vocabulary, maybe a different one like JSON Hyper-schema).

Further, you can simplify your example:

{
  type: 'object',
  properties: {
        "interface": { type: 'string' },
        "dhcp": { type: 'boolean' },
        "ip": { type: 'string', format: 'ipv4' },
        "mask": { type: 'string', format: 'ipv4' },
        "gw": { type: 'string', format: 'ipv4' }
  },
  required: [ 'hostid', 'interface', 'dhcp' ],
  oneOf: [
    { properties: { "dhcp": { enum: [false] } }, required: ['ip', 'mask', 'gw'] },
    { properties: { "dhcp": { enum: [true] } } }
  ]
}

@mbroadst
Copy link
Author

mbroadst commented Jun 9, 2016

So I implemented this with switch like so:

{
  type: 'object',
  properties: {
    interface: { type: 'string' },
    dhcp: { type: 'boolean' },
    ip: { type: 'string', format: 'ipv4' },
    mask: { type: 'string', format: 'ipv4' },
    gw: { type: 'string', format: 'ipv4' }
  },
  required: [ 'hostid', 'interface', 'dhcp' ],
  switch: [{
    if: { properties: { dhcp: { constant: false } } },
    then: { required: [ 'ip', 'mask', 'gw' ]}
  }]
}

which is similar to your proposed cleanup with oneOf. I also agree that your previous issue essentially addresses the main point of this issue

Having said that, I'm not sure I agree with your assessment over on that issue: how would this be considered a misuse of json-schema? While it does introduce complexity to the validation process, I feel like this is a far more readable format for acceptable data than having to provide multiple almost duplicate schemas (perhaps with a single property changed).

You mention that:

Generally, JSON Schema is not intended to be the end-all-be-all of what data is acceptable and what isn't

so perhaps you can clarify what json-schema is intended to be 😄 Because if I go down the list of proposed features of the spec, I find this is completely in line:

  • "describes your existing data format" ✔️
  • "clear, human- and machine-readable documentation" ✔️ (far more readable than the alternative anyOf above)
  • "complete structural validation" ✔️

@awwright
Copy link
Member

awwright commented Jun 13, 2016

With regard to this example with IP addresses in particular, I'm not quite sure that this is structural validation.

JSON Schema (by which I mean, the default JSON Schema profile, as opposed to JSON Hyper-schema or any future profiles) doesn't intend to do things like integrate errors from external data sources (like "that user ID does not exist" or "that IP address is already in use").

(Maybe with an extension, this might be assertable, but that's outside the scope of structural validation.)

What happens if the DHCP server never issues an IP address, which error does that produce? Why can't failing to provide an IP address if dhcp=false be the same error, instead of a structural error? So then what's the benefit of making this a structural requirement?

Even if this example is totally bogus, switch might still be a good idea. The thing in general I'm worried about is that the more expressive something becomes, the harder it is to reason about.

@mbroadst
Copy link
Author

mbroadst commented Jun 13, 2016

@awwright perhaps I should have been more clear in my initial comment here, but I don't consider this example to be bogus at all. This is a schema I'm considering using for form validation of network interface configuration. I had mentioned it was contrived mostly to indicate that it was reduced from its original form to express the issue I required switch for, as well as not optimizing it at all.

There is no external data integrated here, nobody issues us an IP address other than the user of the form. The reason to require those fields when dhcp is false should be more clear in this case. HTH

@awwright
Copy link
Member

I'm not implying it is; there's one thing I'm getting at though... What kind of error is it if DHCP fails and there's no static IP address listed?

Why should no static address with dhcp=false be any different?

@mbroadst
Copy link
Author

Could you clarify what "DHCP fails" means in this context? Do you mean if validation fails for that property?

@awwright
Copy link
Member

The client, because it's configured to use DHCP, sends out a DHCP request but doesn't get a response, and there's no configured static IP address either.

I'm trying to get to the root of what structural validation means here.

@mbroadst
Copy link
Author

mbroadst commented Jun 23, 2016

I apologize, maybe it's just been a while since we originally discussed this but I'm just not following what you're saying. Also, I'm not sure we're coming to any further truths about the world here 😄 switch and oneOf both satisfy my use case, so I think we can call this one finished

@awwright
Copy link
Member

There's no need to close this, this is a real issue, and I'm trying to track down a realistic use-case.

I do need to understand what you're getting at with your example, though, and how it works in the bigger picture (JSON Schema, after all, doesn't actually perform any tasks, it has to be used as part of something that does).

You've provided a simple JSON Schema for configuring a network interface. Someone configures a network interface with an instance of this schema. Now suppose I configure with {"interface":"eth0", "dhcp":true} but the interface never acquires an IP address from a DHCP server. What kind of error is that?

@mbroadst mbroadst reopened this Jun 23, 2016
@mbroadst
Copy link
Author

mbroadst commented Jun 23, 2016

{"interface":"eth0", "dhcp":true} => error that hostid isn't provided

other examples:
{"interface":"eth0", "hostid": "something", "dhcp":true} => no error
{"interface":"eth0", "hostid": "something", "dhcp":false} => error because no ip information was provided
{"interface":"eth0", "hostid": "something", "dhcp":false, "ip": "192.168.1.10"} => error because gw, mask are omitted
{"interface":"eth0", "hostid": "something", "dhcp":false, "ip": "192.168.1.10", "mask": "255.255.255.0", "gw": "192.168.1.1" } => no error

You've provided a simple JSON Schema for configuring a network interface.

Specifically, I'm providing a schema for a configuration file for a network interface. This schema, in my example, will be used to validate a web form of that data before passing it along to a rest api which will eventually save it to disk. To that end, a discussion about what happens to that interface outside the scope of configuring that interface (it makes DHCP requests, the requests are ignored, etc) seems to me to be confusing the discussion here - or I think I'm unfortunately completely missing what you're driving at.

Could you also please provide your definition of structural validation? A google search results in tons of links to sites on atomic modeling, I don't think that's what you're talking about here 😄

@handrews
Copy link
Contributor

handrews commented Sep 5, 2016

@awwright : It sounds like you're concerned about JSON Schema being used to verify runtime behavior, which I (and I think based on his last comment, @mbroadst ) agree is outside of JSON Schema's scope.

However, using it to validate that, based on the value of one field in a particular JSON representation of a resource, the rest of the resource at least defines a potentially valid configuration is a good use case, and one that we used at Riverbed when I worked there. Specifically, this service definition (a collection of json-schema for resources in a given service) uses it in the schema for "vlan_tag":

Riverbed SteelHead Common Type Definitions Service

The relevant portion (in YAML for readablity) is:

vlan_tag:
    description: 'VLAN tag (all, untagged, tagged - [1-4094])'
    type: object
    additionalProperties: false
    properties:
        state:
            description: 'The state of a VLAN tag.'
            type: string
            enum: [ all, untagged, tagged ]
        tag:
            description: 'The "tag" is valid if "state" == "tagged".'
            type: number
            minimum: 1
            maximum: 4094
    oneOf:
    - type: object
      required: [ state ]
      additionalProperties: false
      properties:
        state:
            type: string
            enum: [ all, untagged ]
    - type: object
      required: [ state, tag ]
      additionalProperties: false
      properties:
        state:
            type: string
            enum: [ tagged ]
        tag:
            type: number

which states that when the "state" property has the value "tagged", then both "state" and "tag" are required. However, if "state" is "all" or "untagged", then "tag" is not required.

@handrews
Copy link
Contributor

@mbroadst , @awwright : Reading back through this I'm pretty sure that:

  1. The original request is addressed by the "switch" proposal, which I'm about to migrate here from the wiki, alongside the "bounding" proposal which is an alternate approach to the same problem. Although I think "switch" has more momentum.
  2. The rest of the back and forth was due to a misunderstanding about what "DHCP is false" meant (it literally meant that the value of the "dhcp" property was false, and not that a runtime check of DHCP failed).

If this is true, then there's nothing left to do for this issue. If you agree, could one of you please close it to keep the list of open issue tidy?

@mbroadst
Copy link
Author

yep! sorry definitely meant to close this sooner.

@awwright
Copy link
Member

This stayed open because I indicated my preference for that. I was going to say let's wait until we get an alternative issue filed, please. I sympathize with reducing the number of open issues, but it's more important to make sure issues that are actually issues are, in fact, being tracked.

Can someone link to such an issue, maybe?

@handrews
Copy link
Contributor

@awwright My point was that there isn't any issue left to resolve here. What issue are you trying to track?

@handrews
Copy link
Contributor

@awwright said:

Can someone link to such an issue, maybe?

Issue #64 collects and compares all of the proposals related to "conditionals" in schemas (whether explicit in the form of switch or by clarifying the intention of the existing oneOf approaches, which I prefer. It's a kind of enormous issue but I pulled together at least three overlapping proposals so there wasn't really a way to make it short.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants