Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WIP: resolve refs in schemas in one extrapolated hash #222

Open
wants to merge 2 commits into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
49 changes: 49 additions & 0 deletions lib/json-schema/validator.rb
Original file line number Diff line number Diff line change
Expand Up @@ -137,6 +137,10 @@ def load_ref_schema(parent_schema, ref)
build_schemas(schema)
end

def base_schema
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's normal to use attr_accessor for this kind of thing.

@base_schema
end

def absolutize_ref_uri(ref, parent_schema_uri)
ref_uri = Addressable::URI.parse(ref)

Expand Down Expand Up @@ -238,6 +242,46 @@ def validation_errors
@errors
end

def resolve_nested_references(schema, schema_portion, list)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

schema_portion appears to be a name that's new to the project. Could you pick a name that's already been used before, so it's easier to work out what it refers to?

Also, it's not clear what list is. Could you find a more suitable name?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The only function which receives the same combination, a schema object and a hash describing some property (or the entire) of the schema is in get_referenced_uri_and_schema in RefAttribute where the hash, here schema_portion is called s.
What do you think of schema_hash as name? Or schema_portion_hash as it can also be some nested sub-property of the schema?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually I think there is no better name. Values coming in here as schema_portion are either Hashes, Arrays or primitive values and they are all nested parts of the schema to be resolved (they are neither all hashes nor all valid schemas on their own).

case schema_portion
when Array
# for arrays resolve all items which have not been resolved yet
schema_portion.each_with_index do |item, i|
schema_portion[i] = resolve_nested_references(schema, item, list) unless list.include?(schema_portion[i])
end
when Hash
if schema_portion["$ref"]
# for hashes with a reference, resolve their referenced schema and copy(!) the result into the currently handled portion
_, ref_schema = JSON::Schema::RefAttribute.get_referenced_uri_and_schema(schema_portion, schema, self)
resolved_portion = resolve(ref_schema, ref_schema.schema, list)
schema_portion.clear
resolved_portion.each { |key, value| schema_portion[key] = value }
else
# for hashes without a reference we also resolve recursively what has not been resolved yet
schema_portion.each do |key, value|
schema_portion[key] = resolve_nested_references(schema, value, list) unless list.include?(value)
end
end
end
schema_portion
end

def resolve(schema, handle_schema_hash, list = [])
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not clear to me what handle_schema_hash is, and again, I think that list could have a better name.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think about calling the parameter just schema_hash. It is the hash representing the schema to be resolved (or subschema in nested calls). Its the primary structure the resolution process is working on. The schema parameter is an actual JSON::Schema instance which is required to fetch references.

# if we already dealt with the schema hash we have nothing left to do
return handle_schema_hash if list.include?(handle_schema_hash)
# remember the hash we are dealing with for the future
list << handle_schema_hash
# $ref is specified as 'SHOULD replace the current schema with the schema referenced by the value's URI' so we do exactly that
if handle_schema_hash["$ref"]
_, ref_schema = JSON::Schema::RefAttribute.get_referenced_uri_and_schema(handle_schema_hash, schema, self)
fail "Could not find referenced schema #{handle_schema_hash["$ref"]}" unless ref_schema
# translates to 'replace the current schema with the referenced one and resolve this instead'
handle_schema_hash = resolve(ref_schema, ref_schema.schema, list)
end
# finally we must recursively walk through our schema and resolve nested schemas
resolve_nested_references schema, handle_schema_hash, list
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you be consistent about calling functions with parentheses?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll check whether that's how calls are made elsewhere in the lib. In my company we use parentheses in nested calls but omit them if the hole expression is only a function call. But I'm not particularly tied to this, so I can change this.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

handle_schema_hash
end

class << self
def validate(schema, data,opts={})
Expand Down Expand Up @@ -279,6 +323,11 @@ def fully_validate(schema, data, opts={})
validator.validate
end

def fully_resolve(schema)
validator = JSON::Validator.new(schema, {}, {})
validator.resolve(validator.base_schema, validator.base_schema.schema)
end

def fully_validate_schema(schema, opts={})
data = schema
schema = JSON::Validator.validator_for_name(opts[:version]).metaschema
Expand Down
8 changes: 8 additions & 0 deletions test/schemas/fully_resolve/leave_integer_schema.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,8 @@
{
"$schema": "http://json-schema.org/draft-04/schema#",
"type": "object",
"required": ["age"],
"properties" : {
"age" : { "type": "integer" }
}
}
8 changes: 8 additions & 0 deletions test/schemas/fully_resolve/leave_string_schema.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,8 @@
{
"$schema": "http://json-schema.org/draft-04/schema#",
"type": "object",
"required": ["name"],
"properties" : {
"name" : { "type": "string" }
}
}
17 changes: 17 additions & 0 deletions test/schemas/fully_resolve/linked_list_schema.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,17 @@
{
"$schema": "http://json-schema.org/draft-04/schema#",
"id": "http://example.com/linked_list#",
"type": "object",
"required": ["head", "tail"],
"properties" : {
"head" : {
"type": "object"
},
"tail": {
"oneOf": [
{ "$ref": "#" },
{ "type": "null" }
]
}
}
}
282 changes: 282 additions & 0 deletions test/test_fully_resolve.rb
Original file line number Diff line number Diff line change
@@ -0,0 +1,282 @@
# encoding: utf-8
require File.expand_path('../test_helper', __FILE__)

class FullyResovleDraft4Test < Minitest::Test
def test_top_level_ref
schema = {
"$schema" => "http://json-schema.org/draft-04/schema#",
"$ref" => "test/schemas/fully_resolve/leave_string_schema.json#"
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you mean leaf instead of leave?

}

expected = {
"$schema" => "http://json-schema.org/draft-04/schema#",
"type" => "object",
"required" => ["name"],
"properties" => {
"name" => { "type" => "string" }
}
}

resolved = JSON::Validator.fully_resolve(schema)
assert_equal(resolved, expected, "should resolve a top level $ref reference")

# Test that the result is usable for validation
data = {name: '2'}
assert_valid(schema, data)
assert_valid(resolved, data)
end

def test_one_of_ref
schema = {
"$schema" => "http://json-schema.org/draft-04/schema#",
"oneOf" => [
{ "$ref" => "test/schemas/fully_resolve/leave_string_schema.json#" },
{ "$ref" => "test/schemas/fully_resolve/leave_integer_schema.json#" }
]
}

expected = {
"$schema" => "http://json-schema.org/draft-04/schema#",
"oneOf" => [
{
"$schema" => "http://json-schema.org/draft-04/schema#",
"type" => "object",
"required" => ["name"],
"properties" => {
"name" => { "type" => "string" }
}
},
{
"$schema" => "http://json-schema.org/draft-04/schema#",
"type" => "object",
"required" => ["age"],
"properties" => {
"age" => { "type" => "integer" }
}
}
]
}

resolved = JSON::Validator.fully_resolve(schema)
assert_equal(resolved, expected, "should resolve a top level $ref reference")

# Test that the result is usable for validation
data = {name: '2'}
assert_valid(schema, data)
assert_valid(resolved, data)

data = {age: 2}
assert_valid(schema, data)
assert_valid(resolved, data)
end

def test_linked_list_ref
stub_request(:get, "example.com/linked_list").to_return(:body => File.new('test/schemas/fully_resolve/linked_list_schema.json'), :status => 200)

schema = {
"$schema" => "http://json-schema.org/draft-04/schema#",
"$ref" => "http://example.com/linked_list#"
# it does not work to reference the schema by path
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As mentioned in the PR message, unfortunately I don't know. I asked because I hope @pd or @iainbeeston have an idea what's going wrong here.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can ref files on the file system, but they have to live at the same depth or deeper than the 'entrypoint' schema that contains the $ref. Assuming that condition is met, you can reference by filename or folder+filename, followed by a JSON pointer. I have done so with this gem and a multi-file json schema for many months now.

assuming:

./main.json
./secondary.json
./foo/tertiary.json

a refs inside main.json would look like:

"$ref": "secondary.json#/definitions/something"
"$ref": "foo/tertiary.json#"

This assumes the ID of the main.json file has been modified to reflect it's absolute location on the filesystem prior to loading, and I'm not sure how this would play with a schema created in-memory referencing a file system location, but if you set up the id correctly to be a file:/// location, it should still work.

# "$ref" => 'test/schemas/fully_resolve/linked_list_schema.json'
}

expected = {
"$schema" => "http://json-schema.org/draft-04/schema#",
"id" => "http://example.com/linked_list#",
"type" => "object",
"required" => ["head", "tail"],
"properties" => {
"head" => {
"type" => "object"
},
"tail" => {
}
}
}
expected["properties"]["tail"]["oneOf"] = [
expected,
{ "type" => "null" }
]

resolved = JSON::Validator.fully_resolve(schema)
assert_equal(resolved, expected, "should resolve a top level $ref reference")

data = {"head" => {}, "tail" => nil}
assert_valid(schema, data)
# you don't want to do that it causes infinite recursion aka SystemStackError: stack level too deep
#assert_valid(resolved, data)
end

def test_resolution_of_draft_4_spec
stub_request(:get, "http://json-schema.org/draft-04/schema#").to_return(:body => File.new('resources/draft-04.json'), :status => 200)
schema = {
"$schema" => "http://json-schema.org/draft-04/schema#",
"$ref" => "http://json-schema.org/draft-04/schema#"
}

schema_array = {
"type" => "array",
"minItems" => 1,
"items" => { "$ref" => "#" }
}
positive_integer = {
"type" => "integer",
"minimum" => 0
}
positive_integer_default_0 = {
"allOf" => [ positive_integer, { "default" => 0 } ]
}
string_array = {
"type" => "array",
"items" => { "type" => "string" },
"minItems" => 1,
"uniqueItems" => true
}
simple_types = {
"enum" => [ "array", "boolean", "integer", "null", "number", "object", "string" ]
}
expected = {}
expected.merge!({
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You don't need to create an empty hash and then merge!, do you?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do, because I need a reference to the outer hash within the merged hash which describes the resolved schema as it is self-referencing.

"id" => "http://json-schema.org/draft-04/schema#",
"$schema" => "http://json-schema.org/draft-04/schema#",
"description" => "Core schema meta-schema",
"definitions" => {
"schemaArray" => schema_array,
"positiveInteger" => positive_integer,
"positiveIntegerDefault0" => positive_integer_default_0,
"simpleTypes" => simple_types,
"stringArray" => string_array
},
"type" => "object",
"properties" => {
"id" => {
"type" => "string",
"format" => "uri"
},
"$schema" => {
"type" => "string",
"format" => "uri"
},
"title" => {
"type" => "string"
},
"description" => {
"type" => "string"
},
"default" => {},
"multipleOf" => {
"type" => "number",
"minimum" => 0,
"exclusiveMinimum" => true
},
"maximum" => {
"type" => "number"
},
"exclusiveMaximum" => {
"type" => "boolean",
"default" => false
},
"minimum" => {
"type" => "number"
},
"exclusiveMinimum" => {
"type" => "boolean",
"default" => false
},
"maxLength" => positive_integer,
"minLength" => positive_integer_default_0,
"pattern" => {
"type" => "string",
"format" => "regex"
},
"additionalItems" => {
"anyOf" => [
{ "type" => "boolean" },
expected
],
"default" => {}
},
"items" => {
"anyOf" => [
expected,
schema_array
],
"default" => {}
},
"maxItems" => positive_integer,
"minItems" => positive_integer_default_0,
"uniqueItems" => {
"type" => "boolean",
"default" => false
},
"maxProperties" => positive_integer,
"minProperties" => positive_integer_default_0,
"required" => string_array,
"additionalProperties" => {
"anyOf" => [
{ "type" => "boolean" },
expected
],
"default" => {}
},
"definitions" => {
"type" => "object",
"additionalProperties" => expected,
"default" => {}
},
"properties" => {
"type" => "object",
"additionalProperties" => expected,
"default" => {}
},
"patternProperties" => {
"type" => "object",
"additionalProperties" => expected,
"default" => {}
},
"dependencies" => {
"type" => "object",
"additionalProperties" => {
"anyOf" => [
expected,
string_array
]
}
},
"enum" => {
"type" => "array",
"minItems" => 1,
"uniqueItems" => true
},
"type" => {
"anyOf" => [
simple_types,
{
"type" => "array",
"items" => simple_types,
"minItems" => 1,
"uniqueItems" => true
}
]
},
"allOf" => schema_array,
"anyOf" => schema_array,
"oneOf" => schema_array,
"not" => expected
},
"dependencies" => {
"exclusiveMaximum" => [ "maximum" ],
"exclusiveMinimum" => [ "minimum" ]
},
"default" => {}
})
schema_array['items'] = expected

resolved = JSON::Validator.fully_resolve(schema)
assert_equal(resolved, expected, "should resolve the draft 4 spec")

assert_valid(schema, schema)
end
end