Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

refactor: schema validation #1

Merged

Conversation

csadorf
Copy link

@csadorf csadorf commented Dec 15, 2022

No description provided.

@csadorf csadorf changed the title Refactor schema validation refactor: schema validation Dec 15, 2022
@csadorf csadorf force-pushed the refactor-schema-validation branch 2 times, most recently from 3af4407 to 6f099b5 Compare December 16, 2022 13:23
The previous version would actually require an argument.
@csadorf csadorf marked this pull request as ready for review December 16, 2022 13:44
@csadorf
Copy link
Author

csadorf commented Dec 16, 2022

@vyasr This is ready for review.

Copy link
Owner

@vyasr vyasr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks Simon! This is a definite improvement. The use of references is very helpful. I do worry a little bit about some of the reference names being too generic and overlapping (for instance, Requirement vs Requirements given that they're used in different package lists, or that Matrices is not an array of Matrix). It might be better to use longer, more precise names at the expense of brevity. I'm also OK with waiting to rename until we put the PR into the main branch up for review, though, so that we can get a fresh set of eyes on it.

pyproject.toml Outdated Show resolved Hide resolved
@@ -0,0 +1,160 @@
{
"$schema": "http://json-schema.org/draft-07/schema#",
"$id": "https://raw.githubusercontent.com/rapidsai/dependency-file-generator/v1.0.0/src/rapids_dependency_file_generator/schema.json",
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just FYI, due to the use of semantic-release in maintaining this package, I think the merge of this PR will immediately make this link out of date since the sha won't be v1.0.0 but instead v1.1.0 (assuming we treat this PR as a new feature). We'll need to update this URL accordingly when we merge my PR into the main branch.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we automatically update this URL to match the release version?

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ajschmidt8 you have more familiarity with what we can do with semantic-release, do you have any recommendations on how we could handle this? Specifically, what we want is for the version number in the "$id" URL to be updated to match the current version during a release. My concern was that the PR to update the version number would itself trigger a new version, thus immediately rendering the version in the URL out of date.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@vyasr, @csadorf, yes. we can do that with https://github.com/semantic-release/git.

We'll have to figure out how to configure it, but it will commit "release assets" back to the repository (with a [skip ci] tag) after a release has occurred.

In this case, schema.json would be categorized as a "release asset" since it depends on the release version.

.pre-commit-config.yaml Show resolved Hide resolved
src/rapids_dependency_file_generator/schema.json Outdated Show resolved Hide resolved
src/rapids_dependency_file_generator/schema.json Outdated Show resolved Hide resolved
src/rapids_dependency_file_generator/schema.json Outdated Show resolved Hide resolved
@vyasr
Copy link
Owner

vyasr commented Jan 21, 2023

@csadorf I have one other request with this PR. Could you modify some of the test dependencies.yaml files to invalidate them and show some examples of the output? I am not sure how the refactoring of the schema into components affects the generated error output, and I want to make sure that it's still readable and user-friendly for RAPIDS devs who need to modify dependencies.yaml files without having any knowledge of jsonschema. It's fine for developers of dfg to need to know how this works, but I wouldn't want to reduce the readability of the error output for others. I assume this isn't an issue (or if it is, it's one we can mitigate), just want to confirm.

@csadorf
Copy link
Author

csadorf commented Jan 26, 2023

@csadorf I have one other request with this PR. Could you modify some of the test dependencies.yaml files to invalidate them and show some examples of the output? I am not sure how the refactoring of the schema into components affects the generated error output, and I want to make sure that it's still readable and user-friendly for RAPIDS devs who need to modify dependencies.yaml files without having any knowledge of jsonschema. It's fine for developers of dfg to need to know how this works, but I wouldn't want to reduce the readability of the error output for others. I assume this isn't an issue (or if it is, it's one we can mitigate), just want to confirm.

I ran some tests and here is an example for the new error message:

jsonschema.exceptions.ValidationError: 'abc' is not of type 'object'

Failed validating 'type' in schema['properties']['dependencies']['patternProperties']['.*']['properties']['specific']['items']['properties']['matrices']['items']:
    {'additionalProperties': False,
     'properties': {'matrix': {'oneOf': [{'patternProperties': {'.*': {'type': 'string'}},
                                          'type': 'object'},
                                         {'type': 'null'}]},
                    'packages': {'oneOf': [{'$ref': '#/$defs/requirements'},
                                           {'type': 'null'}]}},
     'requiredProperties': ['matrix', 'packages'],
     'type': 'object'}

On instance['dependencies']['build']['specific'][0]['matrices'][0]:
    'abc'

And the way it would look like previously:

jsonschema.exceptions.ValidationError: 'abc' is not of type 'object'

Failed validating 'type' in schema['properties']['dependencies']['patternProperties']['.*']['properties']['specific']['items']['properties']['matrices']['items']:
    {'properties': {'matrix': {'type': ['null', 'object']},
                    'packages': {'if': {'type': 'array'},
                                 'then': {'if': {'type': 'object'},
                                          'items': {'type': ['string',
                                                             'object']},
                                          'then': {'patternProperties': {'.*': {'if': {'type': 'array'},
                                                                                'then': {'items': {'type': 'string'}},
                                                                                'type': ['array',
                                                                                         'string']}}}},
                                 'type': ['null', 'array', 'string']}},
     'type': 'object'}

On instance['dependencies']['build']['specific'][0]['matrices'][0]:
    'abc'

The header message is the same and while the context is maybe slightly more obfuscated, I think it is actually more readable and it is easier to understand the context of the error.

@vyasr
Copy link
Owner

vyasr commented Jan 30, 2023

I agree, in this case it seems more readable. If there error was one level deeper in the packages key, would the {'$ref': '#/$defs/requirements'} be expanded into the definition?

@csadorf
Copy link
Author

csadorf commented Feb 3, 2023

I agree, in this case it seems more readable. If there error was one level deeper in the packages key, would the {'$ref': '#/$defs/requirements'} be expanded into the definition?

I replaced one of the dependencies with a number (should be a string) and it is not:

jsonschema.exceptions.ValidationError: [1234, 'cuda-python>=11.5,<11.7.1'] is not valid under any of the given schemas

Failed validating 'oneOf' in schema['properties']['dependencies']['patternProperties']['.*']['properties']['specific']['items']['properties']['matrices']['items']['properties']['packages']:
    {'oneOf': [{'$ref': '#/$defs/requirements'}, {'type': 'null'}]}

On instance['dependencies']['build']['specific'][0]['matrices'][0]['packages']:
    [1234, 'cuda-python>=11.5,<11.7.1']

But I am not convinced that simply following all $refs makes the error message inherently more readable. If we are concerned about providing highly intuitive error feedback, I would instead suggest to actually use jsonschema's tooling to identify the most relevant error. In this example using

 validator = jsonschema.Draft7Validator(SCHEMA)
 print(best_match(validator.iter_errors(dependencies)))

will yield

1234 is not of type 'string'

Failed validating 'type' in schema[0]['items']:
    {'type': 'string'}

On instance[0]:
    1234

The best_match function actually accepts a relevance key so we could even tune this to our needs.

@csadorf csadorf requested a review from vyasr February 3, 2023 09:16
@csadorf
Copy link
Author

csadorf commented Feb 3, 2023

@vyasr This should be ready for another review iteration.

@vyasr
Copy link
Owner

vyasr commented Feb 3, 2023

I wasn't aware of best_match, that is helpful. I'm happy with that result.

Copy link
Owner

@vyasr vyasr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These changes look great! I'm going to go ahead and merge this now, and we can handle any follow-ups in the PR merging my branch into the main repo.

@vyasr vyasr merged commit 28eb7bf into vyasr:feat/jsonschema_validation Feb 3, 2023
@csadorf csadorf deleted the refactor-schema-validation branch February 6, 2023 07:44
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants