Skip to content

Commit

Permalink
Improve readme (#19)
Browse files Browse the repository at this point in the history
* Update README.md

* Update README.md

* Update README.md

* Update df_checker.py

* Update input_helpers.py

* Update pyproject.toml
  • Loading branch information
ArthurKordes authored Jul 31, 2024
1 parent 4d86ccb commit 070d106
Show file tree
Hide file tree
Showing 4 changed files with 16 additions and 3 deletions.
13 changes: 13 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -31,6 +31,18 @@ dfs = [df]
results, brontabel_df, bronattribute_df, dqRegel_df = dq_suite.df_check(dfs, dq_rules, "showcase")
```


# Export the schema from Unity Catalog to the Input Form
In order to output the schema from Unity Catalog, use the following commands (using the required schema name):

```
schema_output = dq_suite.export_schema('schema_name', spark)
print(schema_output)
```

Copy the string to the Input Form to quickly ingest the schema in Excel.


# Validate the schema of a table
It is possible to validate the schema of an entire table to a schema definition from Amsterdam Schema in one go. This is done by adding two fields to the "dq_rules" JSON when describing the table (See: https://github.com/Amsterdam/dq-suite-amsterdam/blob/main/dq_rules_example.json).

Expand All @@ -40,6 +52,7 @@ You will need:

The schema definition is converted into column level expectations (expect_column_values_to_be_of_type) on run time.


# Known exceptions
The functions can run on Databricks using a Personal Compute Cluster or using a Job Cluster. Using a Shared Compute Cluster will results in an error, as it does not have the permissions that Great Expectations requires.

Expand Down
2 changes: 1 addition & 1 deletion pyproject.toml
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@ build-backend = "setuptools.build_meta"

[project]
name = "dq-suite-amsterdam"
version = "0.5.1"
version = "0.5.2"
authors = [
{ name="Arthur Kordes", email="a.kordes@amsterdam.nl" },
{ name="Aysegul Cayir Aydar", email="a.cayiraydar@amsterdam.nl" }
Expand Down
2 changes: 1 addition & 1 deletion src/dq_suite/df_checker.py
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,7 @@
import great_expectations as gx
from great_expectations.checkpoint import Checkpoint

from dq_suite.input_validator import validate_dqrules, expand_input, generate_dq_rules_from_schema, fetch_schema_from_github
from dq_suite.input_helpers import validate_dqrules, expand_input, export_schema, generate_dq_rules_from_schema, fetch_schema_from_github
from dq_suite.output_transformations import extract_dq_validatie_data, extract_dq_afwijking_data, create_brontabel, create_bronattribute, create_dqRegel

def df_check(dfs: list, dq_rules: str, check_name: str) -> Tuple[Dict[str, Any], Dict[str, Tuple[Any, Any]], pd.DataFrame, pd.DataFrame, pd.DataFrame]:
Expand Down
2 changes: 1 addition & 1 deletion src/dq_suite/input_helpers.py
Original file line number Diff line number Diff line change
Expand Up @@ -41,7 +41,7 @@ def expand_input(rule_json):
:rtype: dict
"""

for table in rule_json["dataframe_parameters"]:
for table in rule_json["tables"]:
for rule in table["rules"]:
for parameter in rule["parameters"]:
if "row_condition" in parameter:
Expand Down

0 comments on commit 070d106

Please sign in to comment.