Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Keywords for exclusive minimum/maximum #77

Closed
handrews opened this issue Oct 10, 2016 · 37 comments
Closed

Keywords for exclusive minimum/maximum #77

handrews opened this issue Oct 10, 2016 · 37 comments
Labels
clarification Items that need to be clarified in the specification Type: Enhancement
Milestone

Comments

@handrews
Copy link
Contributor

This is somewhat related to the general idea of issue #55 (clearly document validation principles). One such principle is that for the most part, each schema validation keyword can be evaluated against an instance independent of the presence, absence, or value of any other keyword. This is a highly desirable property when refactoring or otherwise reasoning about complex schemas.

More formally, for nearly all validation keywords k1, k2, k3... the following holds true:

{
    "k1": "something",
    "k2": {},
    "k3": ["etc."]
}

is equivalent to

{
    "allOf": [
        {"k1": "something"},
        {"k2": {}},
        {"k3": ["etc."]}
    ]
}

There are only four exceptions:

  • minimum is controlled by exclusiveMinimum's value
  • maximum is controlled by exclusiveMaximum's value
  • additionalItems is controlled by the value of items
  • additionalProperties is controlled by the values of properties and patternProperties

The issues around additionalItems and additionalProperties are outside the scope of this issue, but I mention them as they (especially additionalProperties) are more obvious examples of the problems this sort of dependency causes. In particular, the behavior when working with allOf can be counter-intuitive.

Getting back to numeric validation, here is how minimum and maximum behave:

{
    "type": "number",
    "minimum": 0,
    "exclusiveMinimum": true,
    "maximum": 1,
    "exclusiveMaximum": true
}

is equivalent to

{
    "allOf": [
        {"type": "number"},
        {
            "minimum": 0,
            "exclusiveMinimum": true
        },
        {
            "maximum": 1,
            "exclusiveMaximum": true
        }
    ]
}

but cannot be further separated. This is particularly vexing since exclusiveMinimum and exclusiveMaximum could have been defined to take numbers like this:

{
    "type": "number",
    "exclusiveMinimum": 0,
    "exclusiveMaximum": 1
}

which now behaves nicely, being equivalent to this:

{
    "allOf": [
        {"type": "number"},
        {"exclusiveMinimum": 0},
        {"exclusiveMaximum": 1}
    ]
}

I propose that in Draft 06 we either change the value of exclusiveMinimum/exclusiveMaximum or replace them with differently named keywords that take numeric values. Migrating from Draft 05 to Draft 06 for this would be easily scriptable, as the presence of the old exclusiveMinimum or exclusiveMaximum keywords with boolean value true just means to replace the boolean with the value of minimum or maximum respectively, and remove those properties (and change the name of the "exclusive" keywords if we replace them with new names rather than changing their type).

@jdesrosiers
Copy link
Member

jdesrosiers commented Oct 10, 2016

YES! YES! YES!

This is something that has bugged me for a while. These keywords deviate from the architecture of JSON Schema and there isn't even a reason for it. additionalProperties at least provides something of value (1) but there are always consequences for architectural erosion. I'd be thrilled if we could find some way we could redefine additionalProperties and additionalItems so they fit the architecture as well. minimum and maximum is definitely a good place to start.

(1) I would argue not enough value to be worth using, but that's a whole other discussion.

@awwright
Copy link
Member

awwright commented Oct 11, 2016

I've been thinking about having maxExclusive maxInclusive minExclusive minInclusive, or lt lte gt gte, or something like that.

@epoberezkin
Copy link
Member

epoberezkin commented Oct 11, 2016

I always question the changes that simply change the syntax without adding any additional functionality. Don't forget that having these keywords separate has its uses - easier to change the schema in code, the value for exclusive can be passed as data using $data reference; now you suggest using another keyword entirely.

What is the value in being able to split all keywords in separate schemas?

@awwright
Copy link
Member

@epoberezkin As I just commented in #76, most of the properties are "linear" (for lack of a better term, they don't affect one another; the same way that waves traveling through a medium is said to be linear, or how we have linear algebra).

If we replace the existing keywords, we're not (net) increasing the number of keywords, but we are making more of them linear.

And validators can still simultaneously support the old form at the same time they support the new form, the two versions are distinguishable.

I'm not at all concerned with complexity of implementation, because our job as implementors is to take on that complexity, taking it away from schema authors and people who want to do validation as easily as possible.

@epoberezkin
Copy link
Member

epoberezkin commented Oct 11, 2016

@awwright What I meant is not complexity of implementation - linear keywords are easier to implement. I think they are just less useful and less expressive for the end users. The use cases I was referring to are the ways people use sibling keywords. Imagine you have a schema:

var s = { minimum: 1 };

and now you want to make this minimum exclusive based on some checkbox in the form. With exclusiveMinimum you can:

s.exclusiveMinimum = exclusive;

Without you have to:

if (exclusive && s.minimum) {
    s.exMin = s.minimum;
    delete s.minimum;
} else if (!exclusive && s.exMin) {
    s.minimum = s.exMin;
    delete s.exMin;
}

So what you've achieved by replacing the minimum with exclusiveMinimum qualifier with a pair of different keywords, you've made the life of future validators implementers easier, but you've made life of users more difficult (not even mentioning the effort of updating existing implementations).

Seriously, what's the point?

Also how do you validate the schemas that have both minimum and exMin? I guess whatever is bigger will apply, but still... You kind of achieved keywords independence principle, but you've violated orthogonality principle when two different keywords mean similar things.

It seems to me that allowing keyword "qualifiers" and dependent sibling keywords is a very low price for all the additional value they bring.

@awwright
Copy link
Member

Iirc, I thought of having four different keywords when I was re-writing the definition and trying to figure out how to phrase it so it makes sense.

I think it's more natural for people to write "lt": 5 rather than breaking it into two separate properties. I have to go through a lot of mental gymnastics just to think about how the edge case is handled. (I had to look up the definition of "maximum" just now to be sure I'm using it right!) And when "exclusiveMaximum" can potentially be at the other end of a document, it's feasible to see someone might mess a difficult-to-detect edge case up.

We're already using different keywords/operators for >= and > in programming, it doesn't seem any different to do the same thing in JSON Schema.

Your point sounds a tiny bit contrived to me... first, this isn't going to be true if the implementation is functional. You're going to have something like this instead:

var schema = {};
// etc...
if(exclusive) schema.minExclusive = minimum;
else schema.minInclusive = minimum;

it's going to be the same number of lines either way.

Further, you can rewrite your example like

delete schema.minExclusive;
delete schema.minInclusive;
if(exclusive) schema.minExclusive = minimum;
else schema.minInclusive = minimum;

which is still worse than 1, but better than 6.

Even with all six lines of code, there's no chance someone is going to misunderstand what's going on.

But most of all, I can't think of a time I've ever needed to ask "Inclusive or exclusive?"

@epoberezkin
Copy link
Member

Schema is fundamentally different from code - it is often manipulated and constructed with code. But my main argument against cosmetic changes is that introduces higher cost than benefits to everybody - both users and implementations. I see absolutely no reason to change existing schemas unless it improves the functionality. It is a change based on opinions and preferences; in my opinion :) it is always better to avoid such changes.

The main priority should be the consistency, it helps users to continue using the standard. If you both introduce new valuable features and make changes that require migration effort, then instead of switching to a new version, users can switch to something else completely.

That's exactly the reason why programming languages only extend and almost never change their semantics - it becomes too costly and risky for users to upgrade versions.

We can't really use common sense and analogies with other languages as the main decision criteria - the existing base of users should be the main priority. There are plenty of improvements to make to the standard without changing what works. Why are you so keen to introduce confusion and cause unnecessary efforts to migrate to the next version? What is the value? The fact that somebody is confused with the way it is, likes it more etc. are not strong enough arguments for a change I think.

Also, bear in mind, that by changing existing semantics you invalidate all the existing manuals, documentations, etc., so you complicate not only migration but also the adoption of the standard. There were quite a few promising technologies over the past 40 years that died exactly because people who were pushing their development at some point decided that they "knew better" and instead of evolving and extending started to redefine things. The result was always the same - people switch to a different technology, that has no past baggage. One of the latest casualty of "knowing better" is probably Angular web framework. To me, the choice is simple, you either redefine something from scratch, in which case you may or may not get a sufficient user base for a new thing, or you fully respect the baggage you have and live with it, apart from cases when the change brings some serious improvement. I completely fail to see what this serious improvement is in this and some other similar proposed changes.

@awwright
Copy link
Member

Understood.

Even if the present keywords were to be removed entirely, there's nothing that should suddenly cause existing schemas to stop working. Implementations are fully allowed to support multiple versions simultaneously, as they're are allowed to recognize unknown keywords. And this has been done before (especially the change from "required" from a boolean to an array, many implementations support both, regardless of which "$schema" is selected).

@epoberezkin
Copy link
Member

The change to required was justified by the fact that it went against a very important JSON schema design principle - keywords are only applied if the data exists at the current data level. Old required: true was an exclusion from this rule - it had to be invoked for a missing piece of data. New required: [...] is applied to the object, it won't require anything if the object is missing.

I keep thinking that even postulating JSON-schema principles and values and having some discussion about them before incorporating them into the standard would help making the decisions about changes.

  • "independence" of keywords from anything but sibling keywords
  • "orthogonality" - avoiding overlap in keywords purpose (patternGroups kind of violates it...)
  • "applicability" - keywords only apply to the existing data of a certain type (or all types in case of logical/compound keywords), can't apply to multiple types (in this way I would restrict "format" to strings only).
  • "statelessness" - independent of previous validation results ("switch" violates it, I'd rather redefine it to depend on some data value, like JS switch, than on the process).
  • "backward compatibility" - avoiding semantic changes without sufficiently strong reasons.

If community could agree on the importance of following these or some similar design principles it would simplify decisions about current and many future proposed changes.

@awwright What do you think? Want to make it a separate issue for discussion?

@handrews
Copy link
Contributor Author

@epoberezkin

If community could agree on the importance of following these or some similar design principles it would simplify decisions about current and many future proposed changes.

See issue #55

Having skimmed through this, I'm still in favor of the change here for all of the same reasons. Maintaining compatibility is not compelling when it violates a higher principle and it's trivial to convert a schema from old to new format. As @awwright notes you can keep supporting Draft 04 and its meta-schema will continue to be a valid option.

@handrews
Copy link
Contributor Author

This currently stands with 3 in favor and 1 against. @awwright , can you make a call on this? I'm happy to write it up as a pull request.

@awwright
Copy link
Member

awwright commented Oct 31, 2016

@epoberezkin The change being proposed here is also for promotion of a very important design principle.

@handrews This isn't subject to a vote, let's try to see if we can get everyone on the same page.

I suggest allowing exclusiveMaximum and exclusiveMinimum to be numbers, in which case:

{ exclusiveMaximum: number }

has the same behavior as

{ maximum: number, exclusiveMaximum: true }

for any number production number.

This limits change in vocabulary, allows for reverse compatibility, and the intended behavior should be immediately obvious.

(This is probably what the proposed behavior was, but I don't see any examples to double-check.)

@epoberezkin
Copy link
Member

epoberezkin commented Oct 31, 2016

@awwright I like this. Just to clarify, we explicitly specify something along the lines: "exclusiveMaximum etc. can be both number and boolean, but number is recommended as boolean can be dropped from the future versions of the standard", right? Or don't even warn that it can be dropped and stop at "... and boolean".

@handrews
Copy link
Contributor Author

@handrews This isn't subject to a vote, let's try to see if we can get everyone on the same page.

I wasn't trying to make it a vote, here or anywhere else I've made a note of who has said what. I'm just trying to summarize the current state of some of the longer discussions and identify what is needed to move them forwards.

@handrews
Copy link
Contributor Author

@epoberezkin I would like to drop the boolean form, so I'd like to state the possibility :-) If we do not drop it, then we still have the violation of architectural principle, and we have a more complex implementation to support.

It is trivial for us to provide a migration script to bring Draft 04 schemas forward, so this is not a compatibility problem. You have to identify the draft you're using as a meta-schema anyway, so it's not ambiguous.

@awwright
Copy link
Member

@epoberezkin As my proposal stands right now the boolean form would be left in. But I would prefer to take it out for simplicity, or replace it with a note:

"Previous versions also had a boolean form of exclusiveMinimum. If true, this is the same as saying exclusiveMinimum: {minimum}"

I do think a note would be appropriate in the I-D since it's an implementation concern.

@awwright
Copy link
Member

awwright commented Nov 21, 2016

The numeric "exclusiveMinimum"/"exclusiveMaximum" forms landed in 1b2312a.

Since we're in the habit of putting requirements on schemas, with e.g. "The value of "exclusiveMaximum" MUST be a boolean" it was difficult for me to remove the boolean form entirely without implying we need to break reverse compatibility.

Hopefully in the meantime, this language is clear enough.

In the future, maybe we can do some refactoring so we're putting operational requirements on validators instead of schemas.

@handrews
Copy link
Contributor Author

Wait.. did that commit just go straight to master? Why was it not put up for review as a pull request? (goes and looks at commit log...) Why are there a whole bunch of commits that were not put up for review as a pull request?

@awwright @Relequestual what's the process expectation around reviews? Am I missing something?

@handrews
Copy link
Contributor Author

Since we're in the habit of putting requirements on schemas, with e.g. "The value of "exclusiveMaximum" MUST be a boolean" it was difficult for me to remove the boolean form entirely without implying we need to break reverse compatibility.

We should include language deprecating the boolean form and notifying folks that it will be removed in a future draft (unless some reason not to occurs). That's what we decided on for changing id to $id (deprecation period followed by removal).

@awwright
Copy link
Member

@handrews There was consensus around the feature and how to implement it, so I wrote it up (as an author and editor of the document.) I don't expect anyone to write up the features themselves...

The language says

Schemas SHOULD NOT use the boolean form.

Does that suffice?

@handrews
Copy link
Contributor Author

The language says

Schemas SHOULD NOT use the boolean form.

Does that suffice?

No, because it does not warn of deprecation and removal.

I see you and @Relequestual discussed the PR/review issue elsewhere, so I'll follow up on that point there.

@awwright
Copy link
Member

@handrews It's rather unusual for a standard to explicitly say something is deprecated, since it's normative text proscribing behavior for implementations, instead of being documentation positively warning people that their use of a feature will break in the near future.

We can mention it as a note, but it'll have to be removed before RFC.

@handrews
Copy link
Contributor Author

I am totally fine with it being a <cref></cref>
I was thinking of this bit from v5:

<cref>This pre-processing section is subject to significant change in upcoming drafts.</cref>

when I asked for this anyway. Presumably, the boolean forms will also be removed before RFC (or if they aren't, the comment would be wrong and therefore would be taken out anyway).

I may need to fix my $id PR to make the likely deprecation of id a cref. Will do that now.

@awwright
Copy link
Member

@handrews #162 does this

@CroniD
Copy link

CroniD commented Nov 30, 2016

Hi there,

if possible I would like to add another view to the issue. As a developer I would prefer simple statements, yet powerful. I would be satisfied if there would be two statements only. I believe most of numeric range checks are implemented by defining an inclusive minimum and exclusive maximum, which could be handled as default behavior for "minimum" and "maximum" (if stated as json numbers). As an alternative I would suggest to define expressions for "minimum" and "maximum", which are naturally limited (e.g. "minimum" could only be "value must be greater than minimum" or "value must be greater than or equal to minimum").

ABNF could be:

minimum = number / minimumExpression ; number is a json number, expression is a json string
minimumExpression = quotation-mark ws (gt / gte) ws number ws quotation-mark ; "ws" is optional

maximum = number / maximumExpression ; number is a json number, expression is a json string
maximumExpression = quotation-mark ws (lt / lte) ws number ws quotation-mark ; "ws" is optional

quotation-mark = %x22 ; "
gt = %x3e ; >, greater than
gte = %x3e.3d ; >=, greater than or equal to
lt = %x3c ; <, lower than
lte = %x3c.3d ; <= lower than or equal to

Well, maybe this isn't that bad or total garbage. :D
But it's an option to unify "minimum", "exclusiveMinimum", "maximum" and "exclusiveMaximum".

@awwright
Copy link
Member

@CroniD That also means that JSON Schema validators have to parse a number in a string, when it could just be represented as a number in JSON, I'm not sure what the benefit is.

There's five math operators > >= == <= < and we've now got five keywords to match them, minimumExclusive minimum const maximum and maximumExclusive. Does that work?

@CroniD
Copy link

CroniD commented Nov 30, 2016

Hi @awwright ,

well, to be fair, every concept in this issue will work.
Besides every json schema validator has to parse strings to numbers, 'cause the source is always a string or character sequence or stream that results in a string if you read it. So I would argue that parsing functionality isn't relevant, as long as it is simple to do.

About the benefit ... for me it's just a preference, not sure about yours. So I will just compare them based on my pov:

"minimumExclusive":1
"minimum":1
"const":1
"maximum":1
"maximumExclusive":1
  • total character count: 72 (excluding line feeds)
  • 5 keywords
  • ambiguity if "minimumExclusive" and "minimum" are present (same for maximum ones)
  • if I want to change to exclusive, I need to write 9 characters
"minimum":">1"
"minimum":">=1"
"const":1
"maximum":"<=1"
"maximum":"<1"
  • total character count: 67 (excluding line feeds)
  • 3 keywords
  • no ambiguity regarding keywords
  • difference between number and expression might be confusing
  • if I want to change to exclusive, I need to write 1 character

I will not judge or pretend the comparison is complete. That's up to you folks. :)

@awwright
Copy link
Member

@CroniD If you're looking for brevity, I can do you one better and propose the keywords "gt" "gte" "e" "lte" and "lt" (Or even the keywords literally being ">=" ">" "==" "<=" and "<")

And actually, that's what I suggested at the very top of this thread. This hasn't been totally ruled out, but generally JSON Schema prefers more verbose vocabulary.

There might be a good reason to put a number in a string (especially if the number has a unit attached to it), but typically if I have to break out the number parser to parse a string, I'm essentially putting a JSON document inside a JSON string, which kind of defeats the point of JSON.

In any event, the functionality already exists, there should be a REALLY good reason before we duplicate it.

@CroniD
Copy link

CroniD commented Nov 30, 2016

Hi @awwright ,

I'm sry, there may be a misunderstanding, 'cause I skipped some points. I'm not looking just for brevity, nor do I propose an extension to the current proposal or the other ones in this issue. It's a replacement for "minimum", "minimumExclusive", "maximum" and "maximumExclusive". I propose to compact 4 keywords to 2.

Which means:

  • removing the keywords "minimumExclusive" and "maximumExclusive" (or deprecated them) and
  • changing the keywords "minimum" and "maximum" to the implied meaning of the ABNF and default behavior above.

if I have to break out the number parser to parse a string, I'm essentially putting a JSON document inside a JSON string

Well, the form of the expression isn't a json document, nor json text. The ABNF clarifies that the value can be either a json number or json string, if it's a json string it must be an expression, containing a math operator and a number. Just a json string with a number in it is not valid.

I do know that you all prefer a more verbose style, which is fine by me, that's why I stated "maybe this is [...] total garbage" in my first post. :)

@handrews
Copy link
Contributor Author

@CroniD I'm against this on the grounds that I'm strongly against inventing anything that implementations have to parse beyond what JSON and related standards like JSON Pointer or URI Templates already provide (potentially with compatible extensions). We also want to express JSON Schema in terms of JSON Schema rather than ABNF, so something that has to resort to ABNF for a description does not fit either.

The four fields are clear and follow the design patterns present throughout JSON Schema. While I would not call this "total garbage" :-) I don't think it fits the approach.

@CroniD
Copy link

CroniD commented Dec 1, 2016

Hi @handrews ,

I see, good points. :)

@awwright
Copy link
Member

awwright commented Dec 3, 2016

@handrews "master" should have this ability written in now (though still supporting the boolean form), does that resolve this issue you think?

@handrews
Copy link
Contributor Author

handrews commented Dec 3, 2016

I'm waiting for the meta-schema PR #168 to get merged (which is currently waiting on me submitting an updated commit). It's not done until it's in the meta-schema as well.

@Relequestual
Copy link
Member

@handrews @awwright After reading all these comments, I'm not sure of the status of this issue. Did a PR and megre happen? I can't see a PR to resolve this issue.

@handrews
Copy link
Contributor Author

@Relequestual It's either #185 or #189 depending on whose approach to the solution you think is more correct.

@awwright awwright changed the title v6 validation: exclusive minimum/maximum validation should not rely on keyword combinations Keywords for exclusive minimum/maximum Dec 21, 2016
@handrews
Copy link
Contributor Author

PR #210 is yet another attempt to resolve this, by dumping the boolean forms altogether.
Note that there is still debate even with that approach about further use of dependencies but I've asked to track that in its own issue or with similar dependencies questions raised in issue #209.

@handrews
Copy link
Contributor Author

handrews commented Jan 5, 2017

The last bits of this were resolved by #210.

@handrews handrews closed this as completed Jan 5, 2017
@gregsdennis gregsdennis added clarification Items that need to be clarified in the specification and removed feedback labels Jul 17, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
clarification Items that need to be clarified in the specification Type: Enhancement
Projects
None yet
Development

No branches or pull requests

7 participants