Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement strictly for between tests #74

Merged
merged 13 commits into from
May 29, 2021
22 changes: 22 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,28 @@

* Added support for optional `min_value` and `max_value` parameters to all`*_between_*` tests. ([#70](https://github.com/calogica/dbt-expectations/pull/70))

* Added support for `strictly` parameter to `between` tests. If set to `True`, `striclty` changes the operators `>=` and `<=` to`>` and `<`.

For example, while

```yaml
dbt_expectations.expect_column_stdev_to_be_between:
min_value: 0
```

evaluates to `>= 0`,

```yaml
dbt_expectations.expect_column_stdev_to_be_between:
min_value: 0
strictly: True
```

evaluates to `> 0`.
([#72](https://github.com/calogica/dbt-expectations/issues/72), [#74](https://github.com/calogica/dbt-expectations/pull/74))



## Fixes

* Corrected a typo in the README ([#67](https://github.com/calogica/dbt-expectations/pull/67))
Expand Down
5 changes: 4 additions & 1 deletion README.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,9 @@
# dbt-expectations

Extension package for [**dbt**](https://github.com/fishtown-analytics/dbt) inspired by the [Great Expectations package for Python](https://greatexpectations.io/). The intent is to allow dbt users to deploy GE-like tests in their data warehouse directly from dbt, vs having to add another integration with their data warehouse.
<img src="expectations.gif"/>

**dbt-expectations** is an extension package for [**dbt**](https://github.com/fishtown-analytics/dbt), inspired by the [Great Expectations package for Python](https://greatexpectations.io/). The intent is to allow dbt users to deploy GE-like tests in their data warehouse directly from dbt, vs having to add another integration with their data warehouse.


## Install

Expand Down
Binary file added expectations.gif
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
4 changes: 4 additions & 0 deletions integration_tests/models/schema_tests/schema.yml
Original file line number Diff line number Diff line change
Expand Up @@ -230,6 +230,9 @@ models:
- dbt_expectations.expect_column_stdev_to_be_between:
min_value: 0
max_value: 2
- dbt_expectations.expect_column_stdev_to_be_between:
min_value: 0
strictly: true
- dbt_expectations.expect_column_most_common_value_to_be_in_set:
value_set: [0.5]
top_n: 1
Expand Down Expand Up @@ -289,6 +292,7 @@ models:
- dbt_expectations.expect_table_row_count_to_be_between:
max_value: 10000
group_by: [group_id]
strictly: True
- dbt_expectations.expect_grouped_row_values_to_have_recent_data:
group_by: [group_id]
timestamp_column: date_day
Expand Down
15 changes: 10 additions & 5 deletions macros/schema_tests/_generalized/expression_between.sql
Original file line number Diff line number Diff line change
Expand Up @@ -3,10 +3,11 @@
min_value=None,
max_value=None,
group_by_columns=None,
row_condition=None
row_condition=None,
strictly=False
) %}

{{ dbt_expectations.expression_between(model, expression, min_value, max_value, group_by_columns, row_condition) }}
{{ dbt_expectations.expression_between(model, expression, min_value, max_value, group_by_columns, row_condition, strictly) }}

{% endmacro %}

Expand All @@ -15,18 +16,22 @@
min_value,
max_value,
group_by_columns,
row_condition
row_condition,
strictly
) %}

{%- if min_value is none and max_value is none -%}
{{ exceptions.raise_compiler_error(
"You have to provide either a min_value, max_value or both."
) }}
{%- endif -%}

{%- set strict_operator = "" if strictly else "=" -%}

{% set expression_min_max %}
( 1=1
{%- if min_value is not none %} and {{ expression }} >= {{ min_value }}{% endif %}
{%- if max_value is not none %} and {{ expression }} <= {{ max_value }}{% endif %}
{%- if min_value is not none %} and {{ expression | trim }} >{{ strict_operator }} {{ min_value }}{% endif %}
{%- if max_value is not none %} and {{ expression | trim }} <{{ strict_operator }} {{ max_value }}{% endif %}
)
{% endset %}

Expand Down
13 changes: 11 additions & 2 deletions macros/schema_tests/_generalized/expression_is_true.sql
Original file line number Diff line number Diff line change
Expand Up @@ -9,6 +9,14 @@

{% endmacro %}

{% macro truth_expression(expression) %}
{{ adapter.dispatch('truth_expression', packages = dbt_expectations._get_namespaces()) (expression) }}
{% endmacro %}

{% macro default__truth_expression(expression) %}
{{ expression }} as expression
{% endmacro %}

{% macro expression_is_true(model,
expression,
test_condition="= true",
Expand All @@ -26,7 +34,7 @@ with grouped_expression as (
{{ group_by_column }} as col_{{ loop.index }},
{% endfor -%}
{% endif %}
{{ expression }} as expression
{{ dbt_expectations.truth_expression(expression) }}
from {{ model }}
{%- if row_condition %}
where
Expand Down Expand Up @@ -54,4 +62,5 @@ validation_errors as (
select count(*)
from validation_errors

{% endmacro -%}

{% endmacro -%}
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,8 @@
min_value=None,
max_value=None,
group_by=None,
row_condition=None
row_condition=None,
strictly=False
) %}
{% set expression %}
max({{ column_name }})
Expand All @@ -12,6 +13,7 @@ max({{ column_name }})
min_value=min_value,
max_value=max_value,
group_by_columns=group_by,
row_condition=row_condition
row_condition=row_condition,
strictly=strictly
) }}
{% endmacro %}
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,8 @@
min_value=None,
max_value=None,
group_by=None,
row_condition=None
row_condition=None,
strictly=False
) %}
{% set expression %}
avg({{ column_name }})
Expand All @@ -12,6 +13,7 @@ avg({{ column_name }})
min_value=min_value,
max_value=max_value,
group_by_columns=group_by,
row_condition=row_condition
row_condition=row_condition,
strictly=strictly
) }}
{% endmacro %}
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,8 @@
min_value=None,
max_value=None,
group_by=None,
row_condition=None
row_condition=None,
strictly=False
) %}

{% set expression %}
Expand All @@ -13,6 +14,7 @@
min_value=min_value,
max_value=max_value,
group_by_columns=group_by,
row_condition=row_condition
row_condition=row_condition,
strictly=strictly
) }}
{% endmacro %}
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,8 @@
min_value=None,
max_value=None,
group_by=None,
row_condition=None
row_condition=None,
strictly=False
) %}
{% set expression %}
min({{ column_name }})
Expand All @@ -12,7 +13,8 @@ min({{ column_name }})
min_value=min_value,
max_value=max_value,
group_by_columns=group_by,
row_condition=row_condition
row_condition=row_condition,
strictly=strictly
) }}

{% endmacro %}
Original file line number Diff line number Diff line change
Expand Up @@ -30,7 +30,12 @@ with value_counts as (
{% if row_condition %}
where {{ row_condition }}
{% endif %}
group by 1

group by {% if quote_values -%}
{{ column_name }}
{%- else -%}
cast({{ column_name }} as {{ data_type }})
{%- endif %}

),
value_counts_ranked as (
Expand Down Expand Up @@ -86,4 +91,4 @@ validation_errors as (
select count(*) as validation_errors
from validation_errors

{% endmacro %}
{% endmacro %}
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,8 @@
min_value=None,
max_value=None,
group_by=None,
row_condition=None
row_condition=None,
strictly=False
) %}
{% set expression %}
count(distinct {{ column_name }})/count({{ column_name }})
Expand All @@ -12,7 +13,8 @@ count(distinct {{ column_name }})/count({{ column_name }})
min_value=min_value,
max_value=max_value,
group_by_columns=group_by,
row_condition=row_condition
row_condition=row_condition,
strictly=strictly
) }}

{% endmacro %}
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,8 @@
min_value=None,
max_value=None,
group_by=None,
row_condition=None
row_condition=None,
strictly=False
) %}

{% set expression %}
Expand All @@ -14,6 +15,7 @@
min_value=min_value,
max_value=max_value,
group_by_columns=group_by,
row_condition=row_condition
row_condition=row_condition,
strictly=strictly
) }}
{% endmacro %}
Original file line number Diff line number Diff line change
Expand Up @@ -2,21 +2,26 @@
min_value=None,
max_value=None,
group_by=None,
row_condition=None
row_condition=None,
strictly=False
) -%}
{{ adapter.dispatch('test_expect_column_stdev_to_be_between', packages = dbt_expectations._get_namespaces()) (model, column_name,
{{ adapter.dispatch('test_expect_column_stdev_to_be_between', packages = dbt_expectations._get_namespaces()) (
model, column_name,
min_value,
max_value,
group_by,
row_condition
row_condition,
strictly
) }}
{%- endmacro %}

{% macro default__test_expect_column_stdev_to_be_between(model, column_name,
{% macro default__test_expect_column_stdev_to_be_between(
model, column_name,
min_value,
max_value,
group_by,
row_condition
row_condition,
strictly
) %}

{% set expression %}
Expand All @@ -27,6 +32,7 @@ stddev({{ column_name }})
min_value=min_value,
max_value=max_value,
group_by_columns=group_by,
row_condition=row_condition
row_condition=row_condition,
strictly=strictly
) }}
{% endmacro %}
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,8 @@
min_value=None,
max_value=None,
group_by=None,
row_condition=None
row_condition=None,
strictly=False
) %}
{% set expression %}
sum({{ column_name }})
Expand All @@ -12,6 +13,7 @@ sum({{ column_name }})
min_value=min_value,
max_value=max_value,
group_by_columns=group_by,
row_condition=row_condition
row_condition=row_condition,
strictly=strictly
) }}
{% endmacro %}
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,8 @@
min_value=None,
max_value=None,
group_by=None,
row_condition=None
row_condition=None,
strictly=False
) %}
{% set expression %}
count(distinct {{ column_name }})
Expand All @@ -12,6 +13,7 @@ count(distinct {{ column_name }})
min_value=min_value,
max_value=max_value,
group_by_columns=group_by,
row_condition=row_condition
row_condition=row_condition,
strictly=strictly
) }}
{% endmacro %}
Original file line number Diff line number Diff line change
@@ -1,7 +1,8 @@
{% macro test_expect_column_values_to_be_between(model, column_name,
min_value=None,
max_value=None,
row_condition=None
row_condition=None,
strictly=False
) %}

{% set expression %}
Expand All @@ -13,7 +14,8 @@
min_value=min_value,
max_value=max_value,
group_by_columns=None,
row_condition=row_condition
row_condition=row_condition,
strictly=strictly
) }}


Expand Down
Original file line number Diff line number Diff line change
@@ -1,7 +1,8 @@
{% macro test_expect_column_value_lengths_to_be_between(model, column_name,
min_value=None,
max_value=None,
row_condition=None
row_condition=None,
strictly=False
) %}
{% set expression %}
{{ dbt_utils.length(column_name) }}
Expand All @@ -12,7 +13,8 @@
min_value=min_value,
max_value=max_value,
group_by_columns=None,
row_condition=row_condition
row_condition=row_condition,
strictly=strictly
) }}

{% endmacro %}
Original file line number Diff line number Diff line change
@@ -1,4 +1,10 @@
{% macro test_expect_grouped_row_values_to_have_recent_data(model, group_by, timestamp_column, datepart, interval) %}

{{ adapter.dispatch('test_expect_grouped_row_values_to_have_recent_data', packages = dbt_expectations._get_namespaces()) (model, group_by, timestamp_column, datepart, interval) }}

{% endmacro %}

{% macro default__test_expect_grouped_row_values_to_have_recent_data(model, group_by, timestamp_column, datepart, interval) %}
with latest_grouped_timestamps as (

select
Expand Down
Loading