Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Validation of DCAT-US record #4256

Closed
2 tasks done
jbrown-xentity opened this issue Mar 23, 2023 · 7 comments
Closed
2 tasks done

Validation of DCAT-US record #4256

jbrown-xentity opened this issue Mar 23, 2023 · 7 comments
Assignees

Comments

@jbrown-xentity
Copy link
Contributor

jbrown-xentity commented Mar 23, 2023

User Story

In order to validate a DCAT-US record, data.gov admins (and community) wants a well defined json-schema file.

Acceptance Criteria

[ACs should be clearly demoable/verifiable whenever possible. Try specifying them using BDD.]

  • GIVEN a valid DCAT-US record exists and the validation file
    WHEN the record is processed via the validation file
    THEN the record passes
    AND a test replicates this process

  • GIVEN an invalid DCAT-US record exists and the validation file
    WHEN the invalid record is processed via the validation file
    THEN the record fails with meaningful response
    AND a test replicates this process

Background

We currently have a custom configuration file that plugs into some custom logic. We theoretically do this now, except we use some strange function called Draft4Validator instead of the standard implementation.

Security Considerations (required)

None

Sketch

This might be as simple as copying https://github.com/GSA/ckanext-datajson/tree/main/ckanext/datajson/pod_schema/federal-v1.1 to repo created from #4255, and installing https://github.com/python-jsonschema/jsonschema#jsonschema and validating working as expected. We would like to test against our current test files here. There might be new features/broken items lingering, so consider implementing some of the type of tests here (though not the actual code, as much of that isn't what we want or expect the output to look like).

@nickumia-reisys
Copy link
Contributor

There is an code architecture question of how to best abstract this code and organize it so that it makes sense. What other functions will need to call the validation function? Are we going for a functional vs. object-oriented design? What are the high level abstractions other than "E", "T", "L" that we want to have (if any)?

@hkdctol hkdctol moved this to 📔 Product Backlog in data.gov team board Mar 30, 2023
@hkdctol hkdctol moved this from 📔 Product Backlog to 📟 Sprint Backlog [7] in data.gov team board Mar 30, 2023
@hkdctol hkdctol moved this from 📟 Sprint Backlog [7] to 📔 Product Backlog in data.gov team board Mar 30, 2023
@hkdctol hkdctol added the H2.0/Harvest-General General Harvesting 2.0 Issues label Apr 11, 2023
@jbrown-xentity
Copy link
Contributor Author

jbrown-xentity commented Apr 14, 2023

As part of cleanup of the repo, also look to tackle:

@rshewitt rshewitt self-assigned this Apr 16, 2023
@rshewitt rshewitt moved this from 📔 Product Backlog to 🏗 In Progress [8] in data.gov team board Apr 16, 2023
@rshewitt rshewitt moved this from 🏗 In Progress [8] to 📔 Product Backlog in data.gov team board Apr 16, 2023
@rshewitt rshewitt moved this from 📔 Product Backlog to 🏗 In Progress [8] in data.gov team board Apr 17, 2023
@rshewitt
Copy link
Contributor

rshewitt commented Apr 17, 2023

Ruff doesn't like unused imports. The issue derives from an imported fixture function being used as a string within @pytest.mark.parametrize( "args...", [ ( "fixture_function", file_to_validate.json, is_valid ) ] ). This has been resolved. I learned about the conftest.py file for storing fixtures and making them accessible to other files.

@rshewitt
Copy link
Contributor

@rshewitt
Copy link
Contributor

pytest-depends plugin

@jbrown-xentity
Copy link
Contributor Author

Discovered that json-schema has releases to the spec. Since our spec uses release 4, we need to follow the upgrade path from the official documentation to get on the most supported/most recent json-schema spec.
This will also allow us to get response for all validation errors (not just the first one found), and will allow us to give complete feedback/changes needed to data provider.

@rshewitt
Copy link
Contributor

@rshewitt rshewitt moved this from 🏗 In Progress [8] to ✔ Done in data.gov team board Apr 26, 2023
@jbrown-xentity jbrown-xentity added H2.0/Validate and removed H2.0/Harvest-General General Harvesting 2.0 Issues labels Apr 28, 2023
@hkdctol hkdctol moved this from ✔ Done to Closed in data.gov team board May 1, 2023
@jbrown-xentity jbrown-xentity added the H2.0/Harvest-General General Harvesting 2.0 Issues label May 3, 2023
@btylerburton btylerburton removed the H2.0/Harvest-General General Harvesting 2.0 Issues label Dec 13, 2023
@github-project-automation github-project-automation bot moved this from 🗄 Closed to ✔ Done in data.gov team board Dec 14, 2023
@btylerburton btylerburton moved this from ✔ Done to 🗄 Closed in data.gov team board Dec 14, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Archived in project
Development

No branches or pull requests

5 participants