Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Load schema never ends #348

Closed
octavianN opened this issue Dec 9, 2019 · 1 comment
Closed

Load schema never ends #348

octavianN opened this issue Dec 9, 2019 · 1 comment

Comments

@octavianN
Copy link

If I want to load the attached JSON Schema "ECV-JsonSchema-tNoBOM.json" to a schema validator, the loading never ends.
I tested with the latest release 1.12.0, using SchemaLoader.load(new JSONObject(schema)).

From what I proffiled it seems that the problem is in "ReferenceLookup.lookupObjById". I don't know very well the code, but the schema does not use IDs for referencing, and it should not stay in "lookupObjById".

ECV-JsonSchema-tNoBOM.zip

erosb added a commit to erosb/everit-json-schema that referenced this issue Dec 22, 2019
Context: properly handle references by $id values, whenever we encounter a $ref, after resolving the root schema json,
we have to check if the fragment part of the $ref identifies a subschema by $id. (We have to do it even in cases when
the fragment part is a valid json pointer, because nothing stops a schema author to set $id to be a json pointer string
 - see `ReferenceLookupTest#idAsJsonPointerWorks()` covering this case). To perform this $id check, until now, we always
deep-traversed the entire root schema json for each encountered $ref, looking for a matching $id.

This repeated deep-traversal caused crazy performance problems with extremely big schemas, utilizing a lot of $refs.

This change fixes this problem by deep-traversing each involved json document at most once, at least for document-local
references. `LoadingState` gains a <rootSchema, SubschemaRegistry> map, where SubschemaRegistry is essentially a map of
<$id, subschemaJson> pairs. The registry for a root json is initialized at the first time when it is necessary to look up
a $ref in the json. The registry eagerly deep-traverses the whole root json, and collects the $id -> subschemaJson pairs.
Later the lookup-by-$id part of the $ref lookup process is just an O(1) hashmap lookup.

API changes: no, all affected and newly introduced classes are package-private.
@erosb erosb closed this as completed in 27b5f35 Dec 22, 2019
erosb added a commit that referenced this issue Dec 22, 2019
Context: properly handle references by $id values, whenever we encounter a $ref, after resolving the root schema json,
we have to check if the fragment part of the $ref identifies a subschema by $id. (We have to do it even in cases when
the fragment part is a valid json pointer, because nothing stops a schema author to set $id to be a json pointer string
 - see `ReferenceLookupTest#idAsJsonPointerWorks()` covering this case). To perform this $id check, until now, we always
deep-traversed the entire root schema json for each encountered $ref, looking for a matching $id.

This repeated deep-traversal caused crazy performance problems with extremely big schemas, utilizing a lot of $refs.

This change fixes this problem by deep-traversing each involved json document at most once, at least for document-local
references. `LoadingState` gains a <rootSchema, SubschemaRegistry> map, where SubschemaRegistry is essentially a map of
<$id, subschemaJson> pairs. The registry for a root json is initialized at the first time when it is necessary to look up
a $ref in the json. The registry eagerly deep-traverses the whole root json, and collects the $id -> subschemaJson pairs.
Later the lookup-by-$id part of the $ref lookup process is just an O(1) hashmap lookup.

API changes: no, all affected and newly introduced classes are package-private.

Performance improvements based on local testing: ran a validation with ECV-JsonSchema-tNoBOM.json schema and empty json
instance ({}),
 * with 1.12.0: loading didn't complete (killed the process after 20 minutes)
 * with HEAD: schema loaded & validation passed in ~30 seconds.
@erosb
Copy link
Contributor

erosb commented Dec 22, 2019

Thanks for reporting it. The problem is fixed in version 1.12.1. The schema loaded for me in less than 30 seconds now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants