Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Simplify schema re-use - drop base URI change and JSON pointers support #160

Closed
epoberezkin opened this issue Nov 23, 2016 · 44 comments
Closed
Labels

Comments

@epoberezkin
Copy link
Member

epoberezkin commented Nov 23, 2016

Everybody seems to agree that base URI change is one area of the standard that:

  • lacks compatibility between implementations even only in JavaScript
  • takes substantial efforts to implement, even inconsistently
  • rarely used

Let's think for a second what problems id and $ref solve. The only problem that really needs to be solved is schema re-use (and the usage practice shows it). If you were writing code, the file traditionally defines a namespace and all the symbols that can be accessed from outside of the file should be explicitly made public. Although @awwright cites the departure from the file model in some standard bodies, it doesn't seem to be relevant to software development and JSON, and the fact there is JSON-LD means little for JSON schema, there should be JSON-LD schema for it. Software developers will not depart from file model in a hurry and unless JSON-schema acknowledges it it would simply lose touch with them.

I suggest that to both enable code re-use and ensure the stability of it, we require explicitly declaring pointers to objects that can be used from outside of the schema file. I.e. not only suggesting to drop base URI change from the spec but also drop JSON pointer support in references. That would allow schema authors to have clearly defined symbols that can be used outside and at the same time have freedom to refactor and restructure the rest of their schemas as they wish. One of the valid arguments against $merge was that schema authors may want to prevent modification. This argument equally applies to the desire of popular schema authors to prevent direct access to some areas of the schema and only publish some access points that they would maintain in consistent way (like a schema public API).

The proposal is to:

  • use one attribute as schema uri that identifies the schema globally and where it (optionally) can be retrieved from. E.g. $uri. This attribute can be used only once per schema file on the top level only and it is also a base URI for other references inside.
  • use another attribute to define public names pointing to parts of the schema that can be re-used in other schemas. E.g. $id. This keyword value MUST be identifier (^[a-z]+[a-z0-9_]*$ for public names that can be accessed from outside of the schema file and ^_[a-z]+[a-z0-9_]*$ for private names that can only be used within the file) and its value MUST be unique within the file; redefining it would make schema invalid. The $id keyword should be used at the root of sub-schema that will be referenced by it.
  • references should use the format <uri>#<id> for references to other schemas (and <uri> will be resolved based on $uri) and #<id> for references within the file (which is consistent with $uri providing the base uri for resolution). Given that $ref implies using JSON pointer and using it violates isolation (by providing direct access into private code that can be changed without notice), the proposal is to drop $ref and instead use some other keyword as part of JSON-schema spec, e.g. $call and/or $include. Both keywords can be used with different meaning - $call would be validated in the context of the source schema and $include in the current context. $include is optional, $call is more or less what we have now.

I appreciate that this is the most radical change proposal to simplify schema re-use issue. Please consider it not comparing with what we have now, because we have a mess leading to the lack of compatibility, but from the point of view of existing software development practices. Modularisation, isolation, etc. are normal things in writing code, but for some reasons they are not available to schema authors who craft thousand line documents simply to avoid using $ref that is not consistently implemented and have no way to reliably expose anything less than the whole schema file (reliably = provide guarantee for consumers that it won't change without notice).

I think many of the proposals previously submitted here are focussed on theoretical aspects rather than on practical problems of users and implementations. E.g. @awwright repeatedly cites XML and HTML as inspiration, and I think that these arguments although theoretically correct are in essence fundamentally flawed and completely ignore the fact that JSON is on purpose a much simpler standard and that it is the main reason for its wide adoption. Given that this standard is JSON-Schema and not JSON-LD schema or XML-schema I don't see why arguments referring to the practices existing there should be seriously considered here, while the arguments referring to the actual usage practice of JSON-schema should be ignored. I suggest we ignore all references to XML/HTML practices as irrelevant for JSON Schema.

I would very much appreciate the feedback from @awwright @handrews @Relequestual @fge @jdesrosiers and in particular from the people who were implementing base URI change in existing JSON schema validators: @mafintosh @bugventure @AlexeyGrishin @atrniv @zaggino @automatthew @tdegrunt @Prestaul @natesilva @geraintluff @daveclayton @erosb @stevehu @Julian @hoxworth @hasbridge @justinrainbow @yuloh @JamesNK @RSuter @seagreen @sigu-399 (I see very few people from this list in these conversations which is another sign of standard deterioration and I don't think any decisions about changing the standard should be made without the wider involvement of people who create validators).

@handrews
Copy link
Contributor

I don't think any decisions about changing the standard should be made without the wider involvement of people who create validators

We can't expect everyone to hang out tracking these GitHub issues. Getting that wider feedback is (I'm pretty sure) precisely why @awwright was arguing for more rapid iteration on drafts. We can't force people to come here and participate, but publishing a draft will result in broader review automatically.

from the point of view of existing software development practices

JSON Schema is not code. Not only is this a radical change in syntax, it's a radical change in philosophical perspective, like adding imperative control flow with switch but even more drastic. Function calls and access control are not media type concerns, they are programming language features.

If schema A is referenced by schema B (written by a different author), then unless the authors of A and B agree on change management, it is not in any way the responsibility of A's author to avoid breaking schema B. Organizations who want to suggest access control can do so by conventions such as naming definitions entries with a leading underscore. Implementations can offer optional checks for such custom behavior (just like some currently offer things like strict property checking as a validation mode). The JSON Schema media type does not need to specify this sort of thing.

Schema authors who reference schemas other than root schemas or entries in definitions are engaging in dubious practices to begin with. We should document best practices and advice on the web site, but again, the media type should not attempt to enforce them.


Given the radical nature of this proposal, not only in terms of syntax but the entire philosophy, I do not see any reason to block PR #154 ($id) from going into v6. If this proposal is adopted, it will involve such a dramatic change that it really won't matter whether it's changing from id or $id.

@epoberezkin
Copy link
Member Author

epoberezkin commented Nov 23, 2016

@handrews thank you for the feedback.

JSON Schema is not code.

This discussion of whether JSON-schema is a document or a code is not different from deciding on whether light is a wave or a stream of particles. Both are points of view.

There are many criteria that make JSON schema be similar to document, there is an equal number of criteria that make it look and behave like code:

  1. It is written by coders
  2. It defines transformation of data into a boolean value (valid/invalid)
  3. It can be replaced with a code without any loss of functionality and many validators do exactly that.

If schema A is referenced by schema B (written by a different author), ..., it is not in any way the responsibility of A's author to avoid breaking schema B.

I agree, it is not. It's just a normal practice to bear that responsibility by author A, provided there are ways to do it.

Organizations who want to suggest access control can do so by conventions such as naming definitions entries with a leading underscore.

They can. But the efficiency of embedding conventions into the language as the only way to do things has been proven many times. E.g. Go language.

like adding imperative control flow with switch

I've written before and I can repeat that switch is not more imperative than anyOf/allOf/not because it can be expressed via them. Switch in its proposed form is essentially a boolean expression, in the same way anyOf/allOf/not are. Imperative vs declarative is a point of view rather than fundamental distinction here, both for switch and for anyOf/allOf/not. You may argue that seeing switch as imperative is easier, but it is perceptional difference rather than fundamental.

Not only is this a radical change in syntax, it's a radical change in philosophical perspective

I agree that JSON schema in its current form is definitely on the boundary between document and code. And @kriszyp has managed to keep it on this boundary balancing both perspectives in a very elegant way. I think it's a pity that both @awwright and you rather than maintaining this delicate balance want to make the decision and make it seen and used as a declarative document only, drawing ideas from more complex XML rather than from software development practices. I think it is very likely that eventually another standard would evolve that is very similar to JSON schema but sees it from the code perspective and has features that follow from this philosophy. Which is absolutely fine. There is more than one schema language for XML schemas, so there is no reason to have only one language for JSON schema.

@handrews
Copy link
Contributor

keep it on this boundary balancing both perspectives in a very elegant way.

Could you elaborate on this? I do not see code-like aspects. The fact that you use the document to produce a boolean validation result does not make the schema code, any more than a database schema is code. Code uses the schema document and the instance document as inputs to produce the result.

@epoberezkin
Copy link
Member Author

epoberezkin commented Nov 23, 2016

That was my impression from reading draft 4 spec. I will try to find some specific examples.

The fact that you use the document to produce a boolean validation result does not make the schema code, any more than a database schema is code

Again, I can only repeat that this is the point of view. Also:

database schema:

  • cannot be replaced with the code.
  • describes a state of data.
  • data in the database cannot be invalid.
  • there are no conditionals in database schema.

JSON schema

  • can be replaced with code
  • it describes the validation process rather than the state.
  • it participates in producing result
  • boolean algebra constructs make it even more like code.

@epoberezkin
Copy link
Member Author

epoberezkin commented Nov 23, 2016

I guess this argument is not so much about what JSON schema is, but about how we perceive it and what philosophy we want to use to drive its development.

So that's ok to disagree about these things. I see JSON schema as a validation process definition, essentially a DSL that defines validation logic and process. Even the fact that property names are called "validation keywords" rather than "property names" make it like programming language, where the same term is used to define code elements.

The fact that it uses JSON documents, from my point of view, is not making it any more "document" than the fact that JavaScript code uses text documents - this is just the format used to store it.

I want it to become more like normal code: deterministic, modular, functional, potentially supporting parameters, macros etc. All the efforts I see so far are driving it in the opposite direction. I think that it would eventually increase the area where interoperability is limited, but I can be wrong about that.

I am really interested to know what other people think about it.

@epoberezkin
Copy link
Member Author

Code uses the schema document and the instance document as inputs to produce the result.

You can also say that V8 code uses JavaScript document to produce result. It's just a point of view. I prefer the point of view when a validator interprets (or compiles) JSON schema to produce results in the same way any language interpreter (or compiler) uses code.

@seagreen
Copy link
Collaborator

parameters, macros etc

Not sure I agree with this.

I may have an odd approach to things, but my personal view of what JSON Schema should be is inspired by the old ECMA JSON specification:

Because it is so simple, it is not expected that the JSON grammar will ever change.

I think this is a good thing to aspire to with JSON Schema as well. (Obviously we won't come as close as JSON itself because we have a harder problem, but we can at least try). Validators like "items" are great because they're so simple -- I don't think we'll ever need to change "items".

Programming language features are a different story. There are thousands of them we could add and they could be implemented in many different ways. If we start adding programming features to JSON Schema our chance of ever converging on a "simple, unlikely to change again" specification drops away entirely.

@handrews
Copy link
Contributor

Programming language features are a different story. There are thousands of them we could add and they could be implemented in many different ways. If we start adding programming features to JSON Schema our chance of ever converging on a "simple, unlikely to change again" specification drops away entirely.

^^^^THIS.

I want it to become more like normal code: deterministic, modular, functional, potentially supporting parameters, macros etc.

I am definitely opposed to parameters, macros, or anything else similar, and if that means that some separate standard is built to handle them, I would encourage that separate effort.

You could consider $merge and $patch to be macros, and I am opposed to them for the same reasons I am generally opposed to macros. This clarifies for me that I don't think $use really belongs in JSON Schema either. And probably not $combine either, although that thing makes my head hurt no matter where we put it (and I'm the one who came up with it...)

All of these things, and any similar feature, should be defined and managed outside of JSON Schema. You could create a JSON transform standard involving some or all of them (I think this has come up in another issue before, which demonstrates that it's probably an idea worth considering).

@epoberezkin
Copy link
Member Author

This clarifies for me that I don't think $use really belongs in JSON Schema either. And probably not $combine either, although that thing makes my head hurt no matter where we put it (and I'm the one who came up with it...)

I guess that it is a good result from this discussion :) I like simple too.

@epoberezkin
Copy link
Member Author

epoberezkin commented Nov 24, 2016

@handrews @seagreen @awwright @fge

Leaving esoteric programming ideas and syntactical details aside, I still would like some feedback to the core of this proposal:

  1. Drop base uri change support.
    Reasons:
  • I still haven't seen a real use case when it is needed.
  • I don't know anybody who uses it.
  • Interoperability between validators in this area is lacking.
  • Standard leaves too much to validators discretion
  • As a result many schema authors avoid modularity altogether and build huge schemas in single files
  1. Drop JSON pointer support in references, limit it to explicitly defined IDs only (ignore private/public ideas).
  • If pointing to definitions only is a good practice anyway, why not having it as the only way to address fragments in other schemas?
  • Again, I don't see a real use case for using anything outside definitions.
  • Any common fragments can and should be refactored to definitions anyway.
    So the question remains: why do we need JSON pointers for referencing other schemas?

I was mainly proposing those "programming" ideas as "reductio ad absurdum" argument, and we got to the right conclusion as a result - we want to keep it simple. Which is great.

This proposal is about simplifying the spec, removing theoretic and unnecessary abstractions from it. Less is more.

So $id could only exist on the top level as both URI and URI base. To address definitions you use {"$ref": "<uri>#<prop>"} format (or {"$ref": "<uri>"} to refer to the whole schema) without any $id attributes in definitions, where prop is the property name in definitions object and it's super simple. That's it, nothing else is supported.

@epoberezkin epoberezkin changed the title Simplify schema re-use (ids, baseURI, etc.) - $uri, $id, $call Simplify schema re-use - drop base URI change and JSON pointers support Nov 24, 2016
@handrews
Copy link
Contributor

JSON Pointer support is very important to me. When packaging up a large API, I organize it as follows:

  • The root schema describes the entry point resource
  • The root schema's definitions are sometimes individual resource schemas, but often I will use a definition simply as a scope for a set of closely related resources
    • Inside that scope I will have more definitions that apply only to those resources
    • This lets me namespace various things and refer to "#/defintitions/foo/definitions/name" vs "#/definitions/bar/definitions/name"
    • Just referencing the top level definitions by their keys is not sufficient for me
    • Assigning my own ids to everything would just mean that I'd have to create another namespacing solution within those ids, instead of just leveraging the existing obvious organization of the file.

As for base change support, while it is not my favorite feature, the SHOULD / SHOULD NOT language that @awwright introduced in v5 steers authors away from the most problematic usage. That's a clear acknowledgement that it's possible to abuse the feature, and an equally clear guideline to help avoid abuse.

The functionality is needed in order for $ref to work properly. If a referenced schema has an id, possibly because it is in its own document and therefore that id is given at the root schema level, then it needs to behave the same whether it is referenced or whether it is copied to the location. Further references within that referenced schema need to resolve with the referenced schema's id as the base, not with the referencing schema's base.

Referencing enables modularity only if schemas behave identically whether they are referenced or inlined. That requires allowing id to change the base. I don't like it, but we've gone around on this a bunch of times and I've neither seen nor managed to propose a workable alternative.

So while I do not just stuff base-changing identifiers all over my schema documents, I do rely on the combination of referencing and identifying to implement modularity correctly.

The problems seen in implementations likely had much to do with the very confusing wording in v4. Implementations can and should be fixed, and the wording introduced in v5 make it a lot more clear how it should work.

@epoberezkin
Copy link
Member Author

epoberezkin commented Nov 24, 2016

@handrews

Dropping JSON Pointer support

Even the use case you describe is actually much better suited by my proposal than by using JSON pointers. Using top level properties in definitions is sufficient for 98% of users and even in your case such name-spaced definitions deserve a separate file, so you could have definitions folder with multiple files inside, and use file as a name-space (which is a common practice), in which case your refs would look like: "/defintitions/foo#name" and "/definitions/bar#name".

Complicating the spec for the sake of 2% of users is worse than making those 2% working around the spec limitations. Less is more. Look what apple is doing and where it gets them.

But even if for some reason you dislike having one name-space per file (although such reasons completely escape me), we can also support named ids/anchors inside (i.e. support two attributes, e.g. $uri or $id on the top level and $id or $anchor for named locations, or even use $id for both cases with the meaning depending on the location - I am not precious about specific syntax). In this case your refs would look like, e.g., "#foo-name" and "#bar-name" where you could use "-" as a symbol to separate name-spaces (which is also quite common in many areas, e.g. CSS :).

So your use case only needs JSON pointers because you are used to them, but it absolutely not requires them - you can easily replace them with multiple files (which, I remind, would be better for 98% of spec users) or with named anchors (which is the approach in YAML btw - you cannot use ref without declaring the anchor). And if you think about the possibility that in 3-5 years time we are likely to have 10 times more users of the spec, the fact that they don't have to know anything about JSON pointers to re-use the schemas increases our chances of having 10 times more rather than 2 times more users, which would be nice.

Dropping base URI change (apart from base URI change in $refs).

The functionality is needed in order for $ref to work properly

We had the long discussion on this subject and I think that @awwright , you and myself agreed that it is not exactly the case and also acknowledged that $ref is more complex than simple copy/paste inclusion and you can only see it as equivalent to inclusion with the provision that it also changes base URI, not because the included fragment has id, but because the root schema containing the fragment has id. One of these discussions starts from #66 (comment) but I am reminding the core issue here. Consider this example:

Schema 1

{ 
  "$id": "schema1",
  "allOf": [
    { "$ref": "schema2#foo" }
  ]
}

Schema 2

{
  "$id": "schema2",
  "definitions": {
    "foo": {
      "$id": "#foo",
      "allOf": [
        { "$ref": "#bar" }
      ]
    },
    "bar": {
      "$id": "#bar",
      "type": "integer"
    }
  }

In the example above the fragment of schema 2 included in schema 1 is this:

{
  "$id": "#foo",
  "allOf": [
    { "$ref": "#bar" }
  ]
}

If $id on its own were responsible for changing base URI, then #bar would have to be resolved to schema1#bar (which does not exist). Luckily, the correct behaviour is to resolve #bar to schema2#bar because this fragment is taken from schema2 - and that is the behaviour of $ref and not the consequence of having $id in included schema that changes base URI - $id that changes base URI is in the parent root schema, that is not included.

I hope it all makes sense, if not - please review our past discussions on the subject or I (or @awwright - he has a better way of explaining it) can try to explain it in some other way here.

To summarise - to support the current behaviour of $ref changing base URI in included schemas there is no need to allow $id in subschemas - $id that changes base URI only if it is on the top level is sufficient.

The problems seen in implementations likely had much to do with the very confusing wording in v4. Implementations can and should be fixed, and the wording introduced in v5 make it a lot more clear how it should work.

The key word here is "likely". And I can guarantee you that it is not the case.
You can think that language is to blame only until you start implementing it. Ajv is one of the very few JavaScript validators (if not the only one) that consistently and correctly implements base URI change logic in the way prescribed by the spec in all (or almost all) cases. I had absolutely no problem understanding the spec and writing additional test cases for areas that are not covered by JSON-Schema-Test-Suite. The fact that I could not find a single JavaScript validator that would correctly implement all edge cases in relatively simple situations such as refs to refs with recursion was the main motivation to create Ajv in the first place (and before creating Ajv I created json-schema-consolidate to have a standard API for all validators when I was trying to find the one that works - at the time I did it, none of 11 validators I considered worked for me). After I implemented it all in Ajv I can guarantee you that however you change the language, there is very little chance that the existing logic, however simple and logical it may seem to you, will be consistently supported - it is VERY complex to implement. All other authors I was discussing the issue with had the same opinion. That's the area of Ajv code that I stopped understanding long time ago; it's quite convoluted and I only rely on the many test cases when I need to improve/fix it :).

I believe that it is the main reason why the authors of popular schemas tend to write large schemas rather than structuring them - there is no consistent $ref support across multiple files even in relatively simple cases. So however much I like my current implementation, I would be very happy to drop it in favour of compatibility across different platforms and validators.

So I very seriously urge you, @Relequestual and @awwright to consider dropping both base URI change (while still supporting it in case of $refs, as shown above) and also JSON pointer support, even though you rely on them now (I rely on them too, but it is a very selfish argument by the way), for the sake of sanity of future users and, even more so, implementers of the spec. They would really appreciate it, even if they won't know it.

@epoberezkin
Copy link
Member Author

epoberezkin commented Nov 25, 2016

To further clarify, we definitely need support of base URI change in $refs. What I specifically want to prohibit is this:

{
  "$id": "http://example1.com/schema1.json",
  "allOf": [
    { "$id": "http://example2.com/schema2.json", "type": "integer" }
  ]
}

when internal $id is not the result of "inclusion" with $ref. You may see the case above as equivalent to the example below, but they are not, neither from usage nor from implementation points of view:

{
  "$id": "http://example1.com/schema1.json",
  "allOf": [
    { "$ref": "http://example2.com/schema2.json" }
  ]
}

There must be a distinction in the spec between these two cases, the first case doubles the complexity of base URI change logic and also adds the need to make schema2 available by its uri just because it's mentioned in schema 1.

@handrews
Copy link
Contributor

You may see the case above as equivalent to the example below, but they are not, neither from usage nor from implementation points of view.

You're going to have to elaborate on that. Both implementations that I have written or worked on just transparently dereference $ref. My actual validation code never really "sees" them, they just proxy validation to the referred schema.

So from my point of view there is no difference.

@handrews
Copy link
Contributor

even in your case such name-spaced definitions deserve a separate file

No, the entire point is to package it all up in one file. Which is a use case that has been established many times for many people.

Your 98%/2% numbers are totally made up and therefore not relevant. Even if you have perfect metrics of your own user base, yours is not the only JavaScript implementation, and there are many implementations and users of those implementations in other languages. If you're going to try to win arguments with numbers, you need to back those numbers up. Anecdotal evidence isn't worth much at all.

i.e. support two attributes, e.g. $uri or $id on the top level and $id or $anchor for named locations, or even use $id for both cases with the meaning depending on the location - I am not precious about specific syntax

How is this simpler? For that matter, how can you possibly propose this when you just vehemently shot down my $base/$anchor proposal which was much the same thing?

You are asserting that because you are fine with making up your own namespacing syntax that that is a solution for everyone, which is not the case. There are few things I hate more than making up new things to parse. That's why I want to use standards, whether they are JSON Schema, JSON Pointer, or whatever else.

You assertion that JSON Pointers are some sort of barrier to entry does not match my experience at all. I have coached many teams through the learning process and JSON Pointer was never among the difficult parts. So if we're going to argue with anecdotal evidence, there's mine.

I do not find any of your arguments for dropping JSON Pointer even a little bit convincing, nor do I find your supposed "easier" alternatives to be easier.

@epoberezkin
Copy link
Member Author

Many modern implementations compile JSON schemas to validating code. That is the only way to achieve high performance - all top 5 or so validators do it. When you do it you need to make a decision whether you compile $ref as a separate function (because it can be recursive) or you inline the schema. The current spec prescribes also to resolve inlined schemas with $id without making http calls (although it makes it optional, but it also means lack of compatibility).

Both implementations that I have written or worked on just transparently dereference $ref.

I am also curious if they correctly handle the simple case above - I hope they do :)

I am also curious which part of Ajv test suite your implementations would pass. Given that most of Ajv tests use the same format as JSON-Schema-Test-Suite it would be relatively easy to find out.

In any case, I think the difference in approach to implementation that you used is the reason that you see schema as data that is used together with data instance to produce validation result and I see schema as code that is compiled directly to JavaScript code, so when you do actual validation you no longer need the schema - you just use the generated function. is-my-json-valid, jsen and some other use the same approach.

I can see how interpreting the schemas during the validation also makes it much easier to manage $ref resolution (although more than half of 11 validators I was testing were interpreting rather than compiling, so it is not a single reason).

@epoberezkin
Copy link
Member Author

epoberezkin commented Nov 25, 2016

How is this simpler? For that matter, how can you possibly propose this when you just vehemently shot down my $base/$anchor proposal which was much the same thing?

Because it was not excluding base URI change and was only adding complexity on top of it.

@handrews
Copy link
Contributor

I think that @awwright , you and myself agreed that it is not exactly the case and also acknowledged that $ref is more complex than simple copy/paste inclusion and you can only see it as equivalent to inclusion with the provision that it also changes base URI, not because the included fragment has id, but because the root schema containing the fragment has id.

No, I figured out that you have an excessively complex mental model for $ref but that didn't change my point of view that I find it extremely simple to work with and implement. I've implemented it twice so far.

@epoberezkin
Copy link
Member Author

epoberezkin commented Nov 25, 2016

I've implemented it twice so far.

I am not doubting that. But I have not tested them. What are they. Let's try to run it against tests to see if it works correctly in my cases.

Then we can continue the conversation about complexity/simplicity of $ref support. You may be right, I may be stupid and have tendency to overcomplicate things. Or maybe it is somehow JavaScript thing? Let's find it out.

@handrews
Copy link
Contributor

None of your implementation points are convincing. I have no idea why you think this is so hard. yes, you have to make some decisions about how to arrange it and how much and what to optimize, but one implementation approach should not drive the standard.

The prior project I worked on handled recursive references just fine, and did not do HTTP requests for referenced schemas unless it both did not recognize the id already and had been configured to make such calls (which were desirable in our use case).

Running tests against that project would be complicated because there are other areas where I know it does not conform (it was originally not a straight up JSON Schema library). My new implementation isn't ready for public use yet but will have test results when it is released, and I will be happy to run it against any test you care to throw at it.

@handrews
Copy link
Contributor

But in any event, I still don't see a single reason why any of this implementation talk changes anything about how $id should work. So I'm not going to follow that line of the discussion further.

@epoberezkin
Copy link
Member Author

epoberezkin commented Nov 25, 2016

I still don't see a single reason why any of this implementation talk changes anything about how $id should work

Really? Spec has to be implemented. If there are a lot of implementations that fail to do it in a consistent way the problem is likely with the spec. So what's wrong with my suggestion to establish whether your implementations do it correctly? Because if they don't, fixing them may cause you to reconsider whether it is simple or complex. And if they do it all right, then I need to re-consider my models.

@handrews
Copy link
Contributor

So what's wrong with my suggestion to establish whether your implementations do it correctly?

Nothing, and when the one I'm working on now is more than a half-assed mess we will do so. I just started on it last week.

The older one is, as noted, not properly a JSON Schema validator, it just implements most validation including the $ref part and we didn't have a problem with it. But the only reason I bring that project up is the experience of using it to work with numerous teams and APIs. I would not direct anyone outside of that API suite to it, it is too specific to the suite in question.

@epoberezkin
Copy link
Member Author

epoberezkin commented Nov 25, 2016

when the one I'm working on now is more than a half-assed mess we will do so

good.

we didn't have a problem with it

that's not the same as "it works in all cases where it should by the spec". Whenever I hear "it works for us" I keep saying that function that always returns true produces correct validation result for approximately 50% of JSON-Schema suite test cases and for majority of real life test cases, as everybody seems to write a lot of "pass" tests and very few "fail" tests.

@epoberezkin
Copy link
Member Author

epoberezkin commented Nov 25, 2016

Anyway, thank you for the argument @handrews. At the very least we have established where our different points of view come and also what may change your attitude towards the spec. To be continued...

@handrews
Copy link
Contributor

My "we didn't have a problem with it" meant that I did not have a problem with engineers having difficulty understanding or using it. That is the metric that is important to me.

@Relequestual
Copy link
Member

@epoberezkin I have to admit I don't fully understand the problem here. It also looks like there are multiple issues being discussed in paralle which confuses things further.

Maybe you could provide further examples of tests for how things currently work vs how you'd like them to work, and highlight the tests where the majority of libraries fail, and give your reason why you think the spec is too complex in those instances. You may have already tried to do that here, but there's a lot of muddy discussion outside of the scope of the issue, from what I can tell.

If you plan to do the above, probably would be best as a gist or a number of gists which you then link to from here.

@epoberezkin
Copy link
Member Author

epoberezkin commented Nov 29, 2016

@Relequestual That's exactly my plan. One example of JS validators failing on tests related to $refs is json-schema-benchmark, but it only tests validators against relatively simple cases from JSON-Schema-Test-Suite. I am going to extract some tests I have in Ajv into a separate npm package (they are in the same format as JSON-Schema-Test-Suite - so could be useful for other people even writing non-JS validators) and run JS validators against them. So we will see the scope of the problem.

And it's not that I want them to pass my tests, it follows from the spec that they should... :)

@Relequestual
Copy link
Member

One of your earlier comments hit a point, that the tests need to test both positive and negative situations. Testing how ref works for example isn't validation, but the use of key words can be tested by intensionally creating schemas where you see implementations getting parts wrong, which would highlight the issues.

@Relequestual
Copy link
Member

Also, a JSON Schema document is not code. It tells the code what to do, but the document itself IS NOT code. It does nothing on its own without being an input into a library. If you consider code to be just instructions, then is your backlog code? The backlog is instructions for you to do stuff... but it is not code. We don't call json-schema libraries json-schema compilers do we? nope. /rant

@epoberezkin
Copy link
Member Author

epoberezkin commented Nov 29, 2016

We don't call json-schema libraries json-schema compilers do we? nope. /rant

It's difficult for me not to see JSON-Schema as a code in some DSL given that Ajv and some other validators take schema and convert it into JavaScript code that does validation against exactly that schema.

There is even ajv-pack package that saves compiled schemas into standalone JavaScript modules that can be used without the whole Ajv.

So Ajv, Jsen, Is-my-json-valid and some other are exactly that - "JSON-schema compilers" that compile JSON-schema to JavaScript. So it's a possible and efficient point of view to see JSON-Schema as code.

@Relequestual
Copy link
Member

Relequestual commented Nov 29, 2016

=/ well that's unexpected. I don't understand WHY you would want that.

Regardless, I agree with #160 (comment)

@epoberezkin
Copy link
Member Author

I don't understand WHY you would want that.

For performance. Most common validation use-case is when the same schema is used to validate many data instances (in API). So pre-compiling schemas into validating functions makes validation 50-200 times faster (same as with pre-compiling templates). Look at the chart in benchmark.

@Relequestual
Copy link
Member

Fair enough. If you want to do that at run time, because in your programming language data is code and code is data, then sure.

I still think the spec should remain simple though.

@epoberezkin
Copy link
Member Author

epoberezkin commented Nov 29, 2016

If you want to do that at run time, because in your programming language data is code and code is data, then sure.

Can be done at build time too. Ajv-pack does that. Also there is some old haskell package that does it. It's possible to write compiler from JSON-Schema to any language - in the majority of apps there is a limited number of static schemas bundled with the app, so they can be pre-compiled to code while building the rest of the app.

I still think the spec should remain simple though.

That's exactly what this issue is about - to make the spec simpler to implement. I think you're paying too much attention to some esoteric ideas here which I've only suggested to highlight the complexity of some other proposals coming from the "document" point of view. But the core proposal is to simplify the spec, as much as possible.

Wait for the test results so you can see how inconsistently the spec is supported in the area of $ref.

@epoberezkin
Copy link
Member Author

epoberezkin commented Dec 4, 2016

@handrews @Relequestual @awwright please have a look at the test results of 12 JavaScript validators (including Ajv) using JSON-Schema-Test-Suit and Ajv tests: https://github.com/epoberezkin/test-validators

I tested:

Base URI change support is much worse than I expected. While 5 validators pass a simple test for base URI change from JSON-Schema-Test-Suit (refRemote.json), none of them (!) passed additional base URI change tests that Ajv uses.

Other cases related to $ref resolution also have very inconsistent support - although there is at least a couple of validators that pass each test file, none of the tested validators passed all Ajv tests.

Most of these tests were the result of issues submitted by users who were struggling to implement their use cases, some of them I encountered when using Ajv myself. So they are not some theoretic use cases.

For me these results are enough proof that the spec is difficult to implement.

Also the fact that none of the validators properly supports base URI change as much as Ajv does is for me a confirmation that it is not needed. And that it is supported to some extent makes some other important cases (e.g. recursion) more difficult to implement and as a result inconsistently supported.

@handrews all these tests are in the same format as the tests in JSON-Schema-Test-Suite and the $ref-related tests use a very limited number of keywords. So I recommend that you implement support for $ref in such way that it passes all these tests before implementing other validation keywords. After that we can discuss whether it is simple or complex to implement.

@handrews
Copy link
Contributor

handrews commented Dec 4, 2016

@epoberezkin I have most of the $schema/$id/$ref implementation code-complete and just need to finish my unit testing and then add whatever bit is needed to load the JSON-Schema-Test-Suite format tests (I haven't looked at that yet- maybe it's trivial). I will post results from all test suites here when I have them.

Also, is there a reason that you only tested JavaScript validators? JavaScript is not the only language in use with JSON Schema.

@epoberezkin
Copy link
Member Author

epoberezkin commented Dec 4, 2016

is there a reason that you only tested JavaScript validators

To save time... I understand that JSON-Schema is not limited to JavaScript. I'd be happy to see the results of testing validators in other languages against the same tests.

to load the JSON-Schema-Test-Suite format tests (I haven't looked at that yet- maybe it's trivial)

The format is quite simple, Ajv tests are in the same format - please run against them as well. I made a JS test runner for this format, but it won't be a problem to write it in any language.

@handrews
Copy link
Contributor

handrews commented Dec 4, 2016

The format is quite simple, Ajv tests are in the same format - please run against them as well.

Oh I definitely plan to, sorry if that was not clear. We should probably see about getting those cases added to the "official" suite.

@epoberezkin
Copy link
Member Author

epoberezkin commented Dec 4, 2016

We should probably see about getting those cases added to the "official" suite.

I agree, at least some of them. I'd wait until draft 6 is out - I am going to update them after it. There are also some tests for standard keywords that some validators fail. For example, three validators (jjv, schemasaurus, skeemas) fail the test that requires that if items keyword value is array then the data with fewer valid items is valid. This test definitely should be in the spec suite, maybe in simplified form.

@handrews
Copy link
Contributor

@epoberezkin I'm putting a (shrug) label on this to flag that it's wandered a good bit off from its original point into a general discussion without much agreement. I think we've started talking about bits and pieces of this elsehwere- we can keep this open for reference but it might be better to start again after the current round of id and $ref discussions resolve? I'll leave it up to you whether to close this now or not, but I think it's no longer clear what we need here.

@epoberezkin
Copy link
Member Author

@handrews I think we've agreed that we need to reconsider at a later point whether a current spec is easy to implement. I don't mind closing it for now.

@handrews handrews added the core label May 16, 2017
@handrews handrews added this to the draft-07 (wright-*-02) milestone May 16, 2017
@handrews
Copy link
Contributor

For some reason I never closed this even though we agreed that it was OK to do so, and we can re-file specific points as needed (I think some concepts brought up here actually have since gotten their own issues, but I'm not reading back through this to find out!). Closing now :-)

@handrews handrews removed this from the draft-07 (wright-*-02) milestone Aug 19, 2017
@sgpinkus
Copy link

sgpinkus commented Apr 9, 2021

... we didn't have a problem with it ... when the one I'm working on now is more than a half-assed mess we will do so. I just started on it last week.

Hi @handrews, just wondering whether you ever got around to implementing. Is the implementation is publicly available?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

5 participants