Tests for validation of DCAT-US records #5

rshewitt · 2023-04-18T19:12:47Z

Pull Request

Related to ticket

About

Tests for validation of DCAT-US records. All JSON files in this directory are used.

PR TASKS

The actual code changes.
Tests written and passed.
Any changes to docs?
Bumped version number in setup.py (also checked on PyPi).

github-actions · 2023-04-18T19:13:37Z

Coverage Report

File	Stmts	Miss	Cover
datagovharvester/utils
__init__.py	0	0	100%
json_utilities.py	4	0	100%
datagovharvester/validate
__init__.py	0	0	100%
dcat_us.py	20	0	100%
TOTAL	24	0	100%

Tests	Skipped	Failures	Errors	Time
12	0 💤	0 ❌	0 🔥	13.492s ⏱️

tests/test_demo.py

rshewitt · 2023-04-19T22:19:20Z

Spent some time better understanding the pyproject.toml file. Mistakenly deleted the example package and replaced the --cov value in the action yml.

nickumia-reisys

🦯

nickumia-reisys · 2023-04-28T16:05:00Z

datagovharvester/utils/json_utilities.py

+def open_json(file_path):
+    with open(file_path) as fp:
+        return json.load(fp)


Is this function really necessary? What's the long-term benefit of it? It seems like a frivolous abstraction...

nickumia-reisys · 2023-05-01T12:52:34Z

datagovharvester/validate/dcat_us.py

+def parse_errors(errors):
+    error_message = ""
+
+    for error in errors:
+        error_message += (
+            f"error: {error.message}. offending element: {error.json_path} \n"
+        )
+
+    return error_message


Is this the best consolidation of errors that makes sense? A single string of an unknown number of errors? Isn't a list of strings more flexible?

nickumia-reisys · 2023-05-01T12:54:15Z

tests/validate/dcatus/conftest.py

+BASE_DIR = Path(__file__).parents[3]
+DATA_DIR = BASE_DIR / "data" / "dcatus"
+SCHEMA_DIR = DATA_DIR / "schemas"
+JSON_DIR = DATA_DIR / "jsons"


This is really cool 👍

nickumia-reisys · 2023-05-01T12:58:04Z

tests/validate/dcatus/conftest.py

+def open_dataset_schema():
+    dataset_schema = SCHEMA_DIR / "dataset.json"
+    return open_json(dataset_schema)
+
+
+@pytest.fixture
+def open_catalog_schema():
+    catalog_schema = SCHEMA_DIR / "catalog.json"
+    return open_json(catalog_schema)
+
+
+# invalid
+@pytest.fixture
+def open_numerical_title_json():
+    json_file = JSON_DIR / "numerical-title.data.json"
+    return open_json(json_file)
+
+
+# valid
+@pytest.fixture
+def open_collection_1_parent_2_children_json():
+    json_file = JSON_DIR / "collection-1-parent-2-children.data.json"
+    return open_json(json_file)
+
+
+# invalid
+@pytest.fixture
+def open_missing_catalog_json():
+    json_file = JSON_DIR / "missing-catalog.data.json"
+    return open_json(json_file)
+
+
+# invalid
+@pytest.fixture
+def open_ny_json():
+    json_file = JSON_DIR / "ny.data.json"
+    return open_json(json_file)
+
+
+# invalid
+@pytest.fixture
+def open_missing_identifier_title_json():
+    json_file = JSON_DIR / "missing-identifier-title.data.json"
+    return open_json(json_file)
+
+
+# invalid
+@pytest.fixture
+def open_missing_dataset_fields_json():
+    json_file = JSON_DIR / "missing-dataset-fields.data.json"
+    return open_json(json_file)
+
+
+# valid
+@pytest.fixture
+def open_usda_gov_json():
+    json_file = JSON_DIR / "usda.gov.data.json"
+    return open_json(json_file)
+
+
+# valid
+@pytest.fixture
+def open_arm_json():
+    json_file = JSON_DIR / "arm.data.json"
+    return open_json(json_file)
+
+
+# valid
+@pytest.fixture
+def open_large_spatial_json():
+    json_file = JSON_DIR / "large-spatial.data.json"
+    return open_json(json_file)
+
+
+# valid
+@pytest.fixture
+def open_reserved_title_json():
+    json_file = JSON_DIR / "reserved-title.data.json"
+    return open_json(json_file)
+
+
+# valid
+@pytest.fixture
+def open_collection_2_parent_4_children_json():
+    json_file = JSON_DIR / "collection-2-parent-4-children.data.json"
+    return open_json(json_file)
+
+
+# valid
+@pytest.fixture
+def open_geospatial_json():
+    json_file = JSON_DIR / "geospatial.data.json"
+    return open_json(json_file)
+
+
+# valid
+@pytest.fixture
+def open_null_spatial_json():
+    json_file = JSON_DIR / "null-spatial.data.json"
+    return open_json(json_file)


These tests seem a bit wasteful... Can we have a datajson creator that we pass different parameters to generate the datasets instead of hardcoding the files and having a growing number of open functions? It is a different design choice, but I think it'll be more sustainable than this historical way of running tests.

nickumia-reisys · 2023-05-01T12:59:07Z

datagovharvester/validate/dcat_us.py

+def validate_json_schema(json_data, dataset_schema):
+    success = None
+    error_message = ""
+
+    validator = Draft202012Validator(dataset_schema)
+
+    try:
+        validator.validate(json_data)
+        success = True
+        error_message = "no errors"
+    except ValidationError:
+        success = False
+        errors = validator.iter_errors(json_data)
+        error_message = parse_errors(errors)
+
+    return success, error_message


Can we somehow highlight that this is the only real core code that was added? Add docstrings and a feature list in the README?

rshewitt added 21 commits April 17, 2023 14:10

add pycache folders.

41fa8e5

set pyenv local version.

342fb3f

updated.

11d391e

add jsonschema dependency.

8678f22

add testing data.

dfc1f33

add schemas

af4cfe2

add dcat-us validation test.

21b5e32

housekeeping.

69c6145

add schema fixtures.

bc5e0cc

housekeeping.

a5e8913

create utilities for json-types.

39517b1

removed unused import.

c723896

removed unused fixture.

1cdabda

deleted uneeded files.

de9bcfb

added fixtures to module to fix ruff unused import issue.

a3ee199

caught validation error exception instead of bare.

bc57ebc

reorganized fixures.

c0fd918

reorganized fixtures.

caf2da3

reorganize dcat-us validation parametrize tests.

feef811

add error message. not currently being used.

a1f2f5a

add print for error message.

a3c0959

rshewitt self-assigned this Apr 18, 2023

rshewitt requested review from Jin-Sun-tts and jbrown-xentity and removed request for Jin-Sun-tts April 18, 2023 19:13

jbrown-xentity approved these changes Apr 19, 2023

View reviewed changes

tests/test_demo.py Outdated Show resolved Hide resolved

tests/test_demo.py Outdated Show resolved Hide resolved

rshewitt marked this pull request as draft April 19, 2023 15:23

rshewitt added 2 commits April 19, 2023 13:51

reorganized.

eb27bfb

update coverate dir.

e96ef05

rshewitt added 3 commits April 19, 2023 14:47

reoganize and format.

1985c5f

reorganize.

226c099

reverse value.

7791bd4

rshewitt marked this pull request as ready for review April 20, 2023 14:36

robert-bryson and others added 13 commits April 20, 2023 11:34

validate_json_schema function to utils, refactored implementation

199aeaa

add valid flag.

59239d1

add valid flags.

118a3eb

reorganized.

a4c6338

refactor output.

545fe0a

refactor assertions.

6d95501

removed unneeded file.

fabf4a0

remove previous version of schema.

e155312

update from draft 4 to 2020-12.

218ff9e

fix boolean comparison for ruff.

b437cf1

replaced validate function with Draft202012Validator. Add error parser.

a355fd9

removed unused imports.

20ba0c8

refactored validator.

80b1826

jbrown-xentity approved these changes Apr 26, 2023

View reviewed changes

rshewitt merged commit 8bed0e3 into main Apr 26, 2023

rshewitt deleted the test-validate-dcat-us branch April 26, 2023 14:04

nickumia-reisys mentioned this pull request Apr 28, 2023

Minor Optimizations #6

Merged

nickumia-reisys reviewed May 1, 2023

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Tests for validation of DCAT-US records #5

Tests for validation of DCAT-US records #5

rshewitt commented Apr 18, 2023

github-actions bot commented Apr 18, 2023 •

edited

Loading

rshewitt commented Apr 19, 2023

nickumia-reisys left a comment

nickumia-reisys Apr 28, 2023

nickumia-reisys May 1, 2023

nickumia-reisys May 1, 2023

nickumia-reisys May 1, 2023

nickumia-reisys May 1, 2023

Tests for validation of DCAT-US records #5

Tests for validation of DCAT-US records #5

Conversation

rshewitt commented Apr 18, 2023

Pull Request

About

PR TASKS

github-actions bot commented Apr 18, 2023 • edited Loading

rshewitt commented Apr 19, 2023

nickumia-reisys left a comment

Choose a reason for hiding this comment

nickumia-reisys Apr 28, 2023

Choose a reason for hiding this comment

nickumia-reisys May 1, 2023

Choose a reason for hiding this comment

nickumia-reisys May 1, 2023

Choose a reason for hiding this comment

nickumia-reisys May 1, 2023

Choose a reason for hiding this comment

nickumia-reisys May 1, 2023

Choose a reason for hiding this comment

github-actions bot commented Apr 18, 2023 •

edited

Loading