Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Tests for validation of DCAT-US records #5

Merged
merged 39 commits into from
Apr 26, 2023
Merged

Conversation

rshewitt
Copy link
Contributor

Pull Request

Related to ticket

About

Tests for validation of DCAT-US records. All JSON files in this directory are used.

PR TASKS

  • The actual code changes.
  • Tests written and passed.
  • Any changes to docs?
  • Bumped version number in setup.py (also checked on PyPi).

@rshewitt rshewitt self-assigned this Apr 18, 2023
@github-actions
Copy link

github-actions bot commented Apr 18, 2023

Coverage

Coverage Report
FileStmtsMissCover
datagovharvester/utils
   __init__.py00100%
   json_utilities.py40100%
datagovharvester/validate
   __init__.py00100%
   dcat_us.py200100%
TOTAL240100%

Tests Skipped Failures Errors Time
12 0 💤 0 ❌ 0 🔥 13.492s ⏱️

@rshewitt rshewitt requested review from Jin-Sun-tts and jbrown-xentity and removed request for Jin-Sun-tts April 18, 2023 19:13
tests/test_demo.py Outdated Show resolved Hide resolved
tests/test_demo.py Outdated Show resolved Hide resolved
@rshewitt rshewitt marked this pull request as draft April 19, 2023 15:23
@rshewitt
Copy link
Contributor Author

Spent some time better understanding the pyproject.toml file. Mistakenly deleted the example package and replaced the --cov value in the action yml.

@rshewitt rshewitt marked this pull request as ready for review April 20, 2023 14:36
@rshewitt rshewitt merged commit 8bed0e3 into main Apr 26, 2023
@rshewitt rshewitt deleted the test-validate-dcat-us branch April 26, 2023 14:04
Copy link
Contributor

@nickumia-reisys nickumia-reisys left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🦯

Comment on lines +4 to +6
def open_json(file_path):
with open(file_path) as fp:
return json.load(fp)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this function really necessary? What's the long-term benefit of it? It seems like a frivolous abstraction...

Comment on lines +5 to +13
def parse_errors(errors):
error_message = ""

for error in errors:
error_message += (
f"error: {error.message}. offending element: {error.json_path} \n"
)

return error_message
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this the best consolidation of errors that makes sense? A single string of an unknown number of errors? Isn't a list of strings more flexible?

Comment on lines +5 to +8
BASE_DIR = Path(__file__).parents[3]
DATA_DIR = BASE_DIR / "data" / "dcatus"
SCHEMA_DIR = DATA_DIR / "schemas"
JSON_DIR = DATA_DIR / "jsons"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is really cool 👍

Comment on lines +11 to +111
def open_dataset_schema():
dataset_schema = SCHEMA_DIR / "dataset.json"
return open_json(dataset_schema)


@pytest.fixture
def open_catalog_schema():
catalog_schema = SCHEMA_DIR / "catalog.json"
return open_json(catalog_schema)


# invalid
@pytest.fixture
def open_numerical_title_json():
json_file = JSON_DIR / "numerical-title.data.json"
return open_json(json_file)


# valid
@pytest.fixture
def open_collection_1_parent_2_children_json():
json_file = JSON_DIR / "collection-1-parent-2-children.data.json"
return open_json(json_file)


# invalid
@pytest.fixture
def open_missing_catalog_json():
json_file = JSON_DIR / "missing-catalog.data.json"
return open_json(json_file)


# invalid
@pytest.fixture
def open_ny_json():
json_file = JSON_DIR / "ny.data.json"
return open_json(json_file)


# invalid
@pytest.fixture
def open_missing_identifier_title_json():
json_file = JSON_DIR / "missing-identifier-title.data.json"
return open_json(json_file)


# invalid
@pytest.fixture
def open_missing_dataset_fields_json():
json_file = JSON_DIR / "missing-dataset-fields.data.json"
return open_json(json_file)


# valid
@pytest.fixture
def open_usda_gov_json():
json_file = JSON_DIR / "usda.gov.data.json"
return open_json(json_file)


# valid
@pytest.fixture
def open_arm_json():
json_file = JSON_DIR / "arm.data.json"
return open_json(json_file)


# valid
@pytest.fixture
def open_large_spatial_json():
json_file = JSON_DIR / "large-spatial.data.json"
return open_json(json_file)


# valid
@pytest.fixture
def open_reserved_title_json():
json_file = JSON_DIR / "reserved-title.data.json"
return open_json(json_file)


# valid
@pytest.fixture
def open_collection_2_parent_4_children_json():
json_file = JSON_DIR / "collection-2-parent-4-children.data.json"
return open_json(json_file)


# valid
@pytest.fixture
def open_geospatial_json():
json_file = JSON_DIR / "geospatial.data.json"
return open_json(json_file)


# valid
@pytest.fixture
def open_null_spatial_json():
json_file = JSON_DIR / "null-spatial.data.json"
return open_json(json_file)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These tests seem a bit wasteful... Can we have a datajson creator that we pass different parameters to generate the datasets instead of hardcoding the files and having a growing number of open functions? It is a different design choice, but I think it'll be more sustainable than this historical way of running tests.

Comment on lines +16 to +31
def validate_json_schema(json_data, dataset_schema):
success = None
error_message = ""

validator = Draft202012Validator(dataset_schema)

try:
validator.validate(json_data)
success = True
error_message = "no errors"
except ValidationError:
success = False
errors = validator.iter_errors(json_data)
error_message = parse_errors(errors)

return success, error_message
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we somehow highlight that this is the only real core code that was added? Add docstrings and a feature list in the README?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: 🗄 Closed
Development

Successfully merging this pull request may close these issues.

4 participants