-
Notifications
You must be signed in to change notification settings - Fork 4
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Tests for validation of DCAT-US records #5
Conversation
Coverage Report
|
Spent some time better understanding the pyproject.toml file. Mistakenly deleted the example package and replaced the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🦯
def open_json(file_path): | ||
with open(file_path) as fp: | ||
return json.load(fp) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this function really necessary? What's the long-term benefit of it? It seems like a frivolous abstraction...
def parse_errors(errors): | ||
error_message = "" | ||
|
||
for error in errors: | ||
error_message += ( | ||
f"error: {error.message}. offending element: {error.json_path} \n" | ||
) | ||
|
||
return error_message |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this the best consolidation of errors that makes sense? A single string of an unknown number of errors? Isn't a list of strings more flexible?
BASE_DIR = Path(__file__).parents[3] | ||
DATA_DIR = BASE_DIR / "data" / "dcatus" | ||
SCHEMA_DIR = DATA_DIR / "schemas" | ||
JSON_DIR = DATA_DIR / "jsons" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is really cool 👍
def open_dataset_schema(): | ||
dataset_schema = SCHEMA_DIR / "dataset.json" | ||
return open_json(dataset_schema) | ||
|
||
|
||
@pytest.fixture | ||
def open_catalog_schema(): | ||
catalog_schema = SCHEMA_DIR / "catalog.json" | ||
return open_json(catalog_schema) | ||
|
||
|
||
# invalid | ||
@pytest.fixture | ||
def open_numerical_title_json(): | ||
json_file = JSON_DIR / "numerical-title.data.json" | ||
return open_json(json_file) | ||
|
||
|
||
# valid | ||
@pytest.fixture | ||
def open_collection_1_parent_2_children_json(): | ||
json_file = JSON_DIR / "collection-1-parent-2-children.data.json" | ||
return open_json(json_file) | ||
|
||
|
||
# invalid | ||
@pytest.fixture | ||
def open_missing_catalog_json(): | ||
json_file = JSON_DIR / "missing-catalog.data.json" | ||
return open_json(json_file) | ||
|
||
|
||
# invalid | ||
@pytest.fixture | ||
def open_ny_json(): | ||
json_file = JSON_DIR / "ny.data.json" | ||
return open_json(json_file) | ||
|
||
|
||
# invalid | ||
@pytest.fixture | ||
def open_missing_identifier_title_json(): | ||
json_file = JSON_DIR / "missing-identifier-title.data.json" | ||
return open_json(json_file) | ||
|
||
|
||
# invalid | ||
@pytest.fixture | ||
def open_missing_dataset_fields_json(): | ||
json_file = JSON_DIR / "missing-dataset-fields.data.json" | ||
return open_json(json_file) | ||
|
||
|
||
# valid | ||
@pytest.fixture | ||
def open_usda_gov_json(): | ||
json_file = JSON_DIR / "usda.gov.data.json" | ||
return open_json(json_file) | ||
|
||
|
||
# valid | ||
@pytest.fixture | ||
def open_arm_json(): | ||
json_file = JSON_DIR / "arm.data.json" | ||
return open_json(json_file) | ||
|
||
|
||
# valid | ||
@pytest.fixture | ||
def open_large_spatial_json(): | ||
json_file = JSON_DIR / "large-spatial.data.json" | ||
return open_json(json_file) | ||
|
||
|
||
# valid | ||
@pytest.fixture | ||
def open_reserved_title_json(): | ||
json_file = JSON_DIR / "reserved-title.data.json" | ||
return open_json(json_file) | ||
|
||
|
||
# valid | ||
@pytest.fixture | ||
def open_collection_2_parent_4_children_json(): | ||
json_file = JSON_DIR / "collection-2-parent-4-children.data.json" | ||
return open_json(json_file) | ||
|
||
|
||
# valid | ||
@pytest.fixture | ||
def open_geospatial_json(): | ||
json_file = JSON_DIR / "geospatial.data.json" | ||
return open_json(json_file) | ||
|
||
|
||
# valid | ||
@pytest.fixture | ||
def open_null_spatial_json(): | ||
json_file = JSON_DIR / "null-spatial.data.json" | ||
return open_json(json_file) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These tests seem a bit wasteful... Can we have a datajson creator that we pass different parameters to generate the datasets instead of hardcoding the files and having a growing number of open functions? It is a different design choice, but I think it'll be more sustainable than this historical way of running tests.
def validate_json_schema(json_data, dataset_schema): | ||
success = None | ||
error_message = "" | ||
|
||
validator = Draft202012Validator(dataset_schema) | ||
|
||
try: | ||
validator.validate(json_data) | ||
success = True | ||
error_message = "no errors" | ||
except ValidationError: | ||
success = False | ||
errors = validator.iter_errors(json_data) | ||
error_message = parse_errors(errors) | ||
|
||
return success, error_message |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we somehow highlight that this is the only real core code that was added? Add docstrings and a feature list in the README?
Pull Request
Related to ticket
About
Tests for validation of DCAT-US records. All JSON files in this directory are used.
PR TASKS