-
Notifications
You must be signed in to change notification settings - Fork 282
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Load schema never ends #348
Comments
erosb
added a commit
to erosb/everit-json-schema
that referenced
this issue
Dec 22, 2019
Context: properly handle references by $id values, whenever we encounter a $ref, after resolving the root schema json, we have to check if the fragment part of the $ref identifies a subschema by $id. (We have to do it even in cases when the fragment part is a valid json pointer, because nothing stops a schema author to set $id to be a json pointer string - see `ReferenceLookupTest#idAsJsonPointerWorks()` covering this case). To perform this $id check, until now, we always deep-traversed the entire root schema json for each encountered $ref, looking for a matching $id. This repeated deep-traversal caused crazy performance problems with extremely big schemas, utilizing a lot of $refs. This change fixes this problem by deep-traversing each involved json document at most once, at least for document-local references. `LoadingState` gains a <rootSchema, SubschemaRegistry> map, where SubschemaRegistry is essentially a map of <$id, subschemaJson> pairs. The registry for a root json is initialized at the first time when it is necessary to look up a $ref in the json. The registry eagerly deep-traverses the whole root json, and collects the $id -> subschemaJson pairs. Later the lookup-by-$id part of the $ref lookup process is just an O(1) hashmap lookup. API changes: no, all affected and newly introduced classes are package-private.
erosb
added a commit
that referenced
this issue
Dec 22, 2019
Context: properly handle references by $id values, whenever we encounter a $ref, after resolving the root schema json, we have to check if the fragment part of the $ref identifies a subschema by $id. (We have to do it even in cases when the fragment part is a valid json pointer, because nothing stops a schema author to set $id to be a json pointer string - see `ReferenceLookupTest#idAsJsonPointerWorks()` covering this case). To perform this $id check, until now, we always deep-traversed the entire root schema json for each encountered $ref, looking for a matching $id. This repeated deep-traversal caused crazy performance problems with extremely big schemas, utilizing a lot of $refs. This change fixes this problem by deep-traversing each involved json document at most once, at least for document-local references. `LoadingState` gains a <rootSchema, SubschemaRegistry> map, where SubschemaRegistry is essentially a map of <$id, subschemaJson> pairs. The registry for a root json is initialized at the first time when it is necessary to look up a $ref in the json. The registry eagerly deep-traverses the whole root json, and collects the $id -> subschemaJson pairs. Later the lookup-by-$id part of the $ref lookup process is just an O(1) hashmap lookup. API changes: no, all affected and newly introduced classes are package-private. Performance improvements based on local testing: ran a validation with ECV-JsonSchema-tNoBOM.json schema and empty json instance ({}), * with 1.12.0: loading didn't complete (killed the process after 20 minutes) * with HEAD: schema loaded & validation passed in ~30 seconds.
Thanks for reporting it. The problem is fixed in version |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
If I want to load the attached JSON Schema "ECV-JsonSchema-tNoBOM.json" to a schema validator, the loading never ends.
I tested with the latest release 1.12.0, using SchemaLoader.load(new JSONObject(schema)).
From what I proffiled it seems that the problem is in "ReferenceLookup.lookupObjById". I don't know very well the code, but the schema does not use IDs for referencing, and it should not stay in "lookupObjById".
ECV-JsonSchema-tNoBOM.zip
The text was updated successfully, but these errors were encountered: