-
-
Notifications
You must be signed in to change notification settings - Fork 277
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Simplify schema re-use - drop base URI change and JSON pointers support #160
Comments
We can't expect everyone to hang out tracking these GitHub issues. Getting that wider feedback is (I'm pretty sure) precisely why @awwright was arguing for more rapid iteration on drafts. We can't force people to come here and participate, but publishing a draft will result in broader review automatically.
JSON Schema is not code. Not only is this a radical change in syntax, it's a radical change in philosophical perspective, like adding imperative control flow with If schema A is referenced by schema B (written by a different author), then unless the authors of A and B agree on change management, it is not in any way the responsibility of A's author to avoid breaking schema B. Organizations who want to suggest access control can do so by conventions such as naming Schema authors who reference schemas other than root schemas or entries in Given the radical nature of this proposal, not only in terms of syntax but the entire philosophy, I do not see any reason to block PR #154 ( |
@handrews thank you for the feedback.
This discussion of whether JSON-schema is a document or a code is not different from deciding on whether light is a wave or a stream of particles. Both are points of view. There are many criteria that make JSON schema be similar to document, there is an equal number of criteria that make it look and behave like code:
I agree, it is not. It's just a normal practice to bear that responsibility by author A, provided there are ways to do it.
They can. But the efficiency of embedding conventions into the language as the only way to do things has been proven many times. E.g. Go language.
I've written before and I can repeat that switch is not more imperative than anyOf/allOf/not because it can be expressed via them. Switch in its proposed form is essentially a boolean expression, in the same way anyOf/allOf/not are. Imperative vs declarative is a point of view rather than fundamental distinction here, both for switch and for anyOf/allOf/not. You may argue that seeing switch as imperative is easier, but it is perceptional difference rather than fundamental.
I agree that JSON schema in its current form is definitely on the boundary between document and code. And @kriszyp has managed to keep it on this boundary balancing both perspectives in a very elegant way. I think it's a pity that both @awwright and you rather than maintaining this delicate balance want to make the decision and make it seen and used as a declarative document only, drawing ideas from more complex XML rather than from software development practices. I think it is very likely that eventually another standard would evolve that is very similar to JSON schema but sees it from the code perspective and has features that follow from this philosophy. Which is absolutely fine. There is more than one schema language for XML schemas, so there is no reason to have only one language for JSON schema. |
Could you elaborate on this? I do not see code-like aspects. The fact that you use the document to produce a boolean validation result does not make the schema code, any more than a database schema is code. Code uses the schema document and the instance document as inputs to produce the result. |
That was my impression from reading draft 4 spec. I will try to find some specific examples.
Again, I can only repeat that this is the point of view. Also: database schema:
JSON schema
|
I guess this argument is not so much about what JSON schema is, but about how we perceive it and what philosophy we want to use to drive its development. So that's ok to disagree about these things. I see JSON schema as a validation process definition, essentially a DSL that defines validation logic and process. Even the fact that property names are called "validation keywords" rather than "property names" make it like programming language, where the same term is used to define code elements. The fact that it uses JSON documents, from my point of view, is not making it any more "document" than the fact that JavaScript code uses text documents - this is just the format used to store it. I want it to become more like normal code: deterministic, modular, functional, potentially supporting parameters, macros etc. All the efforts I see so far are driving it in the opposite direction. I think that it would eventually increase the area where interoperability is limited, but I can be wrong about that. I am really interested to know what other people think about it. |
You can also say that V8 code uses JavaScript document to produce result. It's just a point of view. I prefer the point of view when a validator interprets (or compiles) JSON schema to produce results in the same way any language interpreter (or compiler) uses code. |
Not sure I agree with this. I may have an odd approach to things, but my personal view of what JSON Schema should be is inspired by the old ECMA JSON specification:
I think this is a good thing to aspire to with JSON Schema as well. (Obviously we won't come as close as JSON itself because we have a harder problem, but we can at least try). Validators like Programming language features are a different story. There are thousands of them we could add and they could be implemented in many different ways. If we start adding programming features to JSON Schema our chance of ever converging on a "simple, unlikely to change again" specification drops away entirely. |
^^^^THIS.
I am definitely opposed to parameters, macros, or anything else similar, and if that means that some separate standard is built to handle them, I would encourage that separate effort. You could consider All of these things, and any similar feature, should be defined and managed outside of JSON Schema. You could create a JSON transform standard involving some or all of them (I think this has come up in another issue before, which demonstrates that it's probably an idea worth considering). |
I guess that it is a good result from this discussion :) I like simple too. |
@handrews @seagreen @awwright @fge Leaving esoteric programming ideas and syntactical details aside, I still would like some feedback to the core of this proposal:
I was mainly proposing those "programming" ideas as "reductio ad absurdum" argument, and we got to the right conclusion as a result - we want to keep it simple. Which is great. This proposal is about simplifying the spec, removing theoretic and unnecessary abstractions from it. Less is more. So $id could only exist on the top level as both URI and URI base. To address definitions you use |
JSON Pointer support is very important to me. When packaging up a large API, I organize it as follows:
As for base change support, while it is not my favorite feature, the SHOULD / SHOULD NOT language that @awwright introduced in v5 steers authors away from the most problematic usage. That's a clear acknowledgement that it's possible to abuse the feature, and an equally clear guideline to help avoid abuse. The functionality is needed in order for Referencing enables modularity only if schemas behave identically whether they are referenced or inlined. That requires allowing id to change the base. I don't like it, but we've gone around on this a bunch of times and I've neither seen nor managed to propose a workable alternative. So while I do not just stuff base-changing identifiers all over my schema documents, I do rely on the combination of referencing and identifying to implement modularity correctly. The problems seen in implementations likely had much to do with the very confusing wording in v4. Implementations can and should be fixed, and the wording introduced in v5 make it a lot more clear how it should work. |
Dropping JSON Pointer supportEven the use case you describe is actually much better suited by my proposal than by using JSON pointers. Using top level properties in definitions is sufficient for 98% of users and even in your case such name-spaced definitions deserve a separate file, so you could have definitions folder with multiple files inside, and use file as a name-space (which is a common practice), in which case your refs would look like: "/defintitions/foo#name" and "/definitions/bar#name". Complicating the spec for the sake of 2% of users is worse than making those 2% working around the spec limitations. Less is more. Look what apple is doing and where it gets them. But even if for some reason you dislike having one name-space per file (although such reasons completely escape me), we can also support named ids/anchors inside (i.e. support two attributes, e.g. $uri or $id on the top level and $id or $anchor for named locations, or even use $id for both cases with the meaning depending on the location - I am not precious about specific syntax). In this case your refs would look like, e.g., "#foo-name" and "#bar-name" where you could use "-" as a symbol to separate name-spaces (which is also quite common in many areas, e.g. CSS :). So your use case only needs JSON pointers because you are used to them, but it absolutely not requires them - you can easily replace them with multiple files (which, I remind, would be better for 98% of spec users) or with named anchors (which is the approach in YAML btw - you cannot use ref without declaring the anchor). And if you think about the possibility that in 3-5 years time we are likely to have 10 times more users of the spec, the fact that they don't have to know anything about JSON pointers to re-use the schemas increases our chances of having 10 times more rather than 2 times more users, which would be nice. Dropping base URI change (apart from base URI change in $refs).
We had the long discussion on this subject and I think that @awwright , you and myself agreed that it is not exactly the case and also acknowledged that $ref is more complex than simple copy/paste inclusion and you can only see it as equivalent to inclusion with the provision that it also changes base URI, not because the included fragment has id, but because the root schema containing the fragment has id. One of these discussions starts from #66 (comment) but I am reminding the core issue here. Consider this example: Schema 1 {
"$id": "schema1",
"allOf": [
{ "$ref": "schema2#foo" }
]
} Schema 2 {
"$id": "schema2",
"definitions": {
"foo": {
"$id": "#foo",
"allOf": [
{ "$ref": "#bar" }
]
},
"bar": {
"$id": "#bar",
"type": "integer"
}
} In the example above the fragment of schema 2 included in schema 1 is this: {
"$id": "#foo",
"allOf": [
{ "$ref": "#bar" }
]
} If $id on its own were responsible for changing base URI, then I hope it all makes sense, if not - please review our past discussions on the subject or I (or @awwright - he has a better way of explaining it) can try to explain it in some other way here. To summarise - to support the current behaviour of $ref changing base URI in included schemas there is no need to allow $id in subschemas - $id that changes base URI only if it is on the top level is sufficient.
The key word here is "likely". And I can guarantee you that it is not the case. I believe that it is the main reason why the authors of popular schemas tend to write large schemas rather than structuring them - there is no consistent $ref support across multiple files even in relatively simple cases. So however much I like my current implementation, I would be very happy to drop it in favour of compatibility across different platforms and validators. So I very seriously urge you, @Relequestual and @awwright to consider dropping both base URI change (while still supporting it in case of $refs, as shown above) and also JSON pointer support, even though you rely on them now (I rely on them too, but it is a very selfish argument by the way), for the sake of sanity of future users and, even more so, implementers of the spec. They would really appreciate it, even if they won't know it. |
To further clarify, we definitely need support of base URI change in $refs. What I specifically want to prohibit is this: {
"$id": "http://example1.com/schema1.json",
"allOf": [
{ "$id": "http://example2.com/schema2.json", "type": "integer" }
]
} when internal $id is not the result of "inclusion" with $ref. You may see the case above as equivalent to the example below, but they are not, neither from usage nor from implementation points of view: {
"$id": "http://example1.com/schema1.json",
"allOf": [
{ "$ref": "http://example2.com/schema2.json" }
]
} There must be a distinction in the spec between these two cases, the first case doubles the complexity of base URI change logic and also adds the need to make schema2 available by its uri just because it's mentioned in schema 1. |
You're going to have to elaborate on that. Both implementations that I have written or worked on just transparently dereference $ref. My actual validation code never really "sees" them, they just proxy validation to the referred schema. So from my point of view there is no difference. |
No, the entire point is to package it all up in one file. Which is a use case that has been established many times for many people. Your 98%/2% numbers are totally made up and therefore not relevant. Even if you have perfect metrics of your own user base, yours is not the only JavaScript implementation, and there are many implementations and users of those implementations in other languages. If you're going to try to win arguments with numbers, you need to back those numbers up. Anecdotal evidence isn't worth much at all.
How is this simpler? For that matter, how can you possibly propose this when you just vehemently shot down my $base/$anchor proposal which was much the same thing? You are asserting that because you are fine with making up your own namespacing syntax that that is a solution for everyone, which is not the case. There are few things I hate more than making up new things to parse. That's why I want to use standards, whether they are JSON Schema, JSON Pointer, or whatever else. You assertion that JSON Pointers are some sort of barrier to entry does not match my experience at all. I have coached many teams through the learning process and JSON Pointer was never among the difficult parts. So if we're going to argue with anecdotal evidence, there's mine. I do not find any of your arguments for dropping JSON Pointer even a little bit convincing, nor do I find your supposed "easier" alternatives to be easier. |
Many modern implementations compile JSON schemas to validating code. That is the only way to achieve high performance - all top 5 or so validators do it. When you do it you need to make a decision whether you compile $ref as a separate function (because it can be recursive) or you inline the schema. The current spec prescribes also to resolve inlined schemas with $id without making http calls (although it makes it optional, but it also means lack of compatibility).
I am also curious if they correctly handle the simple case above - I hope they do :) I am also curious which part of Ajv test suite your implementations would pass. Given that most of Ajv tests use the same format as JSON-Schema-Test-Suite it would be relatively easy to find out. In any case, I think the difference in approach to implementation that you used is the reason that you see schema as data that is used together with data instance to produce validation result and I see schema as code that is compiled directly to JavaScript code, so when you do actual validation you no longer need the schema - you just use the generated function. is-my-json-valid, jsen and some other use the same approach. I can see how interpreting the schemas during the validation also makes it much easier to manage $ref resolution (although more than half of 11 validators I was testing were interpreting rather than compiling, so it is not a single reason). |
Because it was not excluding base URI change and was only adding complexity on top of it. |
No, I figured out that you have an excessively complex mental model for $ref but that didn't change my point of view that I find it extremely simple to work with and implement. I've implemented it twice so far. |
I am not doubting that. But I have not tested them. What are they. Let's try to run it against tests to see if it works correctly in my cases. Then we can continue the conversation about complexity/simplicity of $ref support. You may be right, I may be stupid and have tendency to overcomplicate things. Or maybe it is somehow JavaScript thing? Let's find it out. |
None of your implementation points are convincing. I have no idea why you think this is so hard. yes, you have to make some decisions about how to arrange it and how much and what to optimize, but one implementation approach should not drive the standard. The prior project I worked on handled recursive references just fine, and did not do HTTP requests for referenced schemas unless it both did not recognize the id already and had been configured to make such calls (which were desirable in our use case). Running tests against that project would be complicated because there are other areas where I know it does not conform (it was originally not a straight up JSON Schema library). My new implementation isn't ready for public use yet but will have test results when it is released, and I will be happy to run it against any test you care to throw at it. |
But in any event, I still don't see a single reason why any of this implementation talk changes anything about how $id should work. So I'm not going to follow that line of the discussion further. |
Really? Spec has to be implemented. If there are a lot of implementations that fail to do it in a consistent way the problem is likely with the spec. So what's wrong with my suggestion to establish whether your implementations do it correctly? Because if they don't, fixing them may cause you to reconsider whether it is simple or complex. And if they do it all right, then I need to re-consider my models. |
Nothing, and when the one I'm working on now is more than a half-assed mess we will do so. I just started on it last week. The older one is, as noted, not properly a JSON Schema validator, it just implements most validation including the $ref part and we didn't have a problem with it. But the only reason I bring that project up is the experience of using it to work with numerous teams and APIs. I would not direct anyone outside of that API suite to it, it is too specific to the suite in question. |
good.
that's not the same as "it works in all cases where it should by the spec". Whenever I hear "it works for us" I keep saying that function that always returns true produces correct validation result for approximately 50% of JSON-Schema suite test cases and for majority of real life test cases, as everybody seems to write a lot of "pass" tests and very few "fail" tests. |
Anyway, thank you for the argument @handrews. At the very least we have established where our different points of view come and also what may change your attitude towards the spec. To be continued... |
My "we didn't have a problem with it" meant that I did not have a problem with engineers having difficulty understanding or using it. That is the metric that is important to me. |
@epoberezkin I have to admit I don't fully understand the problem here. It also looks like there are multiple issues being discussed in paralle which confuses things further. Maybe you could provide further examples of tests for how things currently work vs how you'd like them to work, and highlight the tests where the majority of libraries fail, and give your reason why you think the spec is too complex in those instances. You may have already tried to do that here, but there's a lot of muddy discussion outside of the scope of the issue, from what I can tell. If you plan to do the above, probably would be best as a gist or a number of gists which you then link to from here. |
@Relequestual That's exactly my plan. One example of JS validators failing on tests related to $refs is json-schema-benchmark, but it only tests validators against relatively simple cases from JSON-Schema-Test-Suite. I am going to extract some tests I have in Ajv into a separate npm package (they are in the same format as JSON-Schema-Test-Suite - so could be useful for other people even writing non-JS validators) and run JS validators against them. So we will see the scope of the problem. And it's not that I want them to pass my tests, it follows from the spec that they should... :) |
One of your earlier comments hit a point, that the tests need to test both positive and negative situations. Testing how ref works for example isn't validation, but the use of key words can be tested by intensionally creating schemas where you see implementations getting parts wrong, which would highlight the issues. |
Also, a JSON Schema document is not code. It tells the code what to do, but the document itself IS NOT code. It does nothing on its own without being an input into a library. If you consider code to be just instructions, then is your backlog code? The backlog is instructions for you to do stuff... but it is not code. We don't call json-schema libraries json-schema compilers do we? nope. /rant |
It's difficult for me not to see JSON-Schema as a code in some DSL given that Ajv and some other validators take schema and convert it into JavaScript code that does validation against exactly that schema. There is even ajv-pack package that saves compiled schemas into standalone JavaScript modules that can be used without the whole Ajv. So Ajv, Jsen, Is-my-json-valid and some other are exactly that - "JSON-schema compilers" that compile JSON-schema to JavaScript. So it's a possible and efficient point of view to see JSON-Schema as code. |
=/ well that's unexpected. I don't understand WHY you would want that. Regardless, I agree with #160 (comment) |
For performance. Most common validation use-case is when the same schema is used to validate many data instances (in API). So pre-compiling schemas into validating functions makes validation 50-200 times faster (same as with pre-compiling templates). Look at the chart in benchmark. |
Fair enough. If you want to do that at run time, because in your programming language data is code and code is data, then sure. I still think the spec should remain simple though. |
Can be done at build time too. Ajv-pack does that. Also there is some old haskell package that does it. It's possible to write compiler from JSON-Schema to any language - in the majority of apps there is a limited number of static schemas bundled with the app, so they can be pre-compiled to code while building the rest of the app.
That's exactly what this issue is about - to make the spec simpler to implement. I think you're paying too much attention to some esoteric ideas here which I've only suggested to highlight the complexity of some other proposals coming from the "document" point of view. But the core proposal is to simplify the spec, as much as possible. Wait for the test results so you can see how inconsistently the spec is supported in the area of $ref. |
@handrews @Relequestual @awwright please have a look at the test results of 12 JavaScript validators (including Ajv) using JSON-Schema-Test-Suit and Ajv tests: https://github.com/epoberezkin/test-validators I tested:
Base URI change support is much worse than I expected. While 5 validators pass a simple test for base URI change from JSON-Schema-Test-Suit (refRemote.json), none of them (!) passed additional base URI change tests that Ajv uses. Other cases related to $ref resolution also have very inconsistent support - although there is at least a couple of validators that pass each test file, none of the tested validators passed all Ajv tests. Most of these tests were the result of issues submitted by users who were struggling to implement their use cases, some of them I encountered when using Ajv myself. So they are not some theoretic use cases. For me these results are enough proof that the spec is difficult to implement. Also the fact that none of the validators properly supports base URI change as much as Ajv does is for me a confirmation that it is not needed. And that it is supported to some extent makes some other important cases (e.g. recursion) more difficult to implement and as a result inconsistently supported. @handrews all these tests are in the same format as the tests in JSON-Schema-Test-Suite and the $ref-related tests use a very limited number of keywords. So I recommend that you implement support for $ref in such way that it passes all these tests before implementing other validation keywords. After that we can discuss whether it is simple or complex to implement. |
@epoberezkin I have most of the $schema/$id/$ref implementation code-complete and just need to finish my unit testing and then add whatever bit is needed to load the JSON-Schema-Test-Suite format tests (I haven't looked at that yet- maybe it's trivial). I will post results from all test suites here when I have them. Also, is there a reason that you only tested JavaScript validators? JavaScript is not the only language in use with JSON Schema. |
To save time... I understand that JSON-Schema is not limited to JavaScript. I'd be happy to see the results of testing validators in other languages against the same tests.
The format is quite simple, Ajv tests are in the same format - please run against them as well. I made a JS test runner for this format, but it won't be a problem to write it in any language. |
Oh I definitely plan to, sorry if that was not clear. We should probably see about getting those cases added to the "official" suite. |
I agree, at least some of them. I'd wait until draft 6 is out - I am going to update them after it. There are also some tests for standard keywords that some validators fail. For example, three validators (jjv, schemasaurus, skeemas) fail the test that requires that if |
@epoberezkin I'm putting a (shrug) label on this to flag that it's wandered a good bit off from its original point into a general discussion without much agreement. I think we've started talking about bits and pieces of this elsehwere- we can keep this open for reference but it might be better to start again after the current round of id and $ref discussions resolve? I'll leave it up to you whether to close this now or not, but I think it's no longer clear what we need here. |
@handrews I think we've agreed that we need to reconsider at a later point whether a current spec is easy to implement. I don't mind closing it for now. |
For some reason I never closed this even though we agreed that it was OK to do so, and we can re-file specific points as needed (I think some concepts brought up here actually have since gotten their own issues, but I'm not reading back through this to find out!). Closing now :-) |
Hi @handrews, just wondering whether you ever got around to implementing. Is the implementation is publicly available? |
Everybody seems to agree that base URI change is one area of the standard that:
Let's think for a second what problems id and $ref solve. The only problem that really needs to be solved is schema re-use (and the usage practice shows it). If you were writing code, the file traditionally defines a namespace and all the symbols that can be accessed from outside of the file should be explicitly made public. Although @awwright cites the departure from the file model in some standard bodies, it doesn't seem to be relevant to software development and JSON, and the fact there is JSON-LD means little for JSON schema, there should be JSON-LD schema for it. Software developers will not depart from file model in a hurry and unless JSON-schema acknowledges it it would simply lose touch with them.
I suggest that to both enable code re-use and ensure the stability of it, we require explicitly declaring pointers to objects that can be used from outside of the schema file. I.e. not only suggesting to drop base URI change from the spec but also drop JSON pointer support in references. That would allow schema authors to have clearly defined symbols that can be used outside and at the same time have freedom to refactor and restructure the rest of their schemas as they wish. One of the valid arguments against $merge was that schema authors may want to prevent modification. This argument equally applies to the desire of popular schema authors to prevent direct access to some areas of the schema and only publish some access points that they would maintain in consistent way (like a schema public API).
The proposal is to:
^[a-z]+[a-z0-9_]*$
for public names that can be accessed from outside of the schema file and^_[a-z]+[a-z0-9_]*$
for private names that can only be used within the file) and its value MUST be unique within the file; redefining it would make schema invalid. The $id keyword should be used at the root of sub-schema that will be referenced by it.<uri>#<id>
for references to other schemas (and<uri>
will be resolved based on $uri) and#<id>
for references within the file (which is consistent with $uri providing the base uri for resolution). Given that $ref implies using JSON pointer and using it violates isolation (by providing direct access into private code that can be changed without notice), the proposal is to drop $ref and instead use some other keyword as part of JSON-schema spec, e.g. $call and/or $include. Both keywords can be used with different meaning - $call would be validated in the context of the source schema and $include in the current context. $include is optional, $call is more or less what we have now.I appreciate that this is the most radical change proposal to simplify schema re-use issue. Please consider it not comparing with what we have now, because we have a mess leading to the lack of compatibility, but from the point of view of existing software development practices. Modularisation, isolation, etc. are normal things in writing code, but for some reasons they are not available to schema authors who craft thousand line documents simply to avoid using $ref that is not consistently implemented and have no way to reliably expose anything less than the whole schema file (reliably = provide guarantee for consumers that it won't change without notice).
I think many of the proposals previously submitted here are focussed on theoretical aspects rather than on practical problems of users and implementations. E.g. @awwright repeatedly cites XML and HTML as inspiration, and I think that these arguments although theoretically correct are in essence fundamentally flawed and completely ignore the fact that JSON is on purpose a much simpler standard and that it is the main reason for its wide adoption. Given that this standard is JSON-Schema and not JSON-LD schema or XML-schema I don't see why arguments referring to the practices existing there should be seriously considered here, while the arguments referring to the actual usage practice of JSON-schema should be ignored. I suggest we ignore all references to XML/HTML practices as irrelevant for JSON Schema.
I would very much appreciate the feedback from @awwright @handrews @Relequestual @fge @jdesrosiers and in particular from the people who were implementing base URI change in existing JSON schema validators: @mafintosh @bugventure @AlexeyGrishin @atrniv @zaggino @automatthew @tdegrunt @Prestaul @natesilva @geraintluff @daveclayton @erosb @stevehu @Julian @hoxworth @hasbridge @justinrainbow @yuloh @JamesNK @RSuter @seagreen @sigu-399 (I see very few people from this list in these conversations which is another sign of standard deterioration and I don't think any decisions about changing the standard should be made without the wider involvement of people who create validators).
The text was updated successfully, but these errors were encountered: