Skip to content

Commit

Permalink
Add new macros for diff calculation, and unit tests (#99) (#101)
Browse files Browse the repository at this point in the history
* Add new macros for diff calculation, and unit tests (#99)

* Add macro for new hash-based comparison strategy

* split out SF-focused version of macro

* Fix change to complex object

* Fix overuse of star

* switch from compare rels to compare queries

* provide wrapping parens

* switch to array of columns for PK

* split unit tests into own files, change unit tests to array pk

* tidy up get_comp_bounds

* fix arg rename

* add quick_are_queries_identical and unit tests

* Move data tests into own directory

* Add test for multiple PKs

* fix incorrect unit test configs

* make data types for id and id_2 big enough nums

* Mock event_time response

* fix hardcoded value in quick_are_qs_identical

* Add unit tests for null handling (still broken)

* Rename columsn to be more unique

* Steal surrogate key macro from utils

* Use generated surrogate key across the board in place of PK

* rm my profile reference

* Update quick_are_queries_identical.sql

* Add diagram explaining comparison bounds

* Add comments explaining warehouse-specific optimisations

* cross-db support

* subq

* no postgres or redshift for a sec

* add default var values for compare wrappers

* avoid lateral alias reference for BQ

* BQ doesn't support count(arg1, arg2)

* re-enable redshift

* Alias subq for redshift

* remove extra comma

* add row status of nonunique_pk

* remove redundant test and wrapper model

* Create json-y tests for snowflake

* Add workaround for redshift to support count num rows in status

* skip incompatible tests

* Fix redshift lack of bool_or support in window funcs

* add skip exclusions for everything else

* fix incorrect skip tag application

* Move user configs to project.yml from profiles

* Temporarily disable unpassable redshift tests

* add temp skip to circle's config.yml

* forgot tag: method

* Temporarily skip reworked_compare_all_statuses_different_column_set

* Skip another test redshift

* disable unsupported tests BQ

* postgres too?

* Fixes for postgres

* namespace macros

* It's a postgres problem, not a redshift problem

* Handle postgres 63 char limit

* Add databricks

* Rename tests to data_tests

* Found a better workaround for missing count distinct window

* actually call the macro

* disable syntax-failing tests on dbx

* try to install core from main to get sorting fix

* Revert "try to install core from main to get sorting fix"

This reverts commit d28f3e1.

* Audit helper code review changes

* add BQ support for qucik are queries identical

* explain why using dense_rank

* remove the compile step to avoid compilation error

* Don't throw incompatible quick compare error during parse

* add where clause to check we're not assuming its absence

* enable first basic struct tests

* Skip raising exception during parsing

* json_build_object doesn't work on rs

* changed behaviour redshift

* skip complex structs on rs for now

* temp disable all complex structs

* skip some currently failoing bq tests

* Properly exclude tests to skip, add comments

* dbx too

* rename reworked_compare to compare_and_classify_query_results

* Rename files

* rename macro file

* Add relation_focused macros

* Add BQ-specific generate_set_results for hashes, enable json tests

* Implement hash comparisons for BQ and DBX (#103)

* disable tests for unrelated adapters

* Avoid lateral column aliasing

* First cross-db complex struct fixture

* Add final fixtures

* Initial work on dbx compatibility

* remove lateral column alias dbx

* cast everything as string before hashing

* add comment, enable all tests again

* rename to dbt_audit_in_a instead of in_a

* Protect against missing PK columns

* gitignore package-lock.yml

* add dbx variant of simple structs

* Rename private macros to have _ prefix

* Fix get comparison bounds (#104)

* change to getting comparison bounds for queries not relations

* add test for introspective queries

* Make compare query columns multi pk (#105)

* rm packagelock.yml
  • Loading branch information
joellabes authored Jun 13, 2024
1 parent 8473293 commit d10124a
Show file tree
Hide file tree
Showing 64 changed files with 1,752 additions and 106 deletions.
33 changes: 20 additions & 13 deletions .circleci/config.yml
Original file line number Diff line number Diff line change
Expand Up @@ -33,7 +33,7 @@ jobs:
. dbt_venv/bin/activate
python -m pip install --upgrade pip setuptools
python -m pip install --pre dbt-core dbt-postgres dbt-redshift dbt-snowflake dbt-bigquery
python -m pip install --pre dbt-core dbt-postgres dbt-redshift dbt-snowflake dbt-bigquery dbt-databricks
mkdir -p ~/.dbt
cp integration_tests/ci/sample.profiles.yml ~/.dbt/profiles.yml
Expand All @@ -51,9 +51,8 @@ jobs:
cd integration_tests
dbt deps --target postgres
dbt seed --target postgres --full-refresh
dbt compile --target postgres
dbt run --target postgres
dbt test --target postgres
dbt run --target postgres --exclude tag:skip+ tag:temporary_skip+
dbt test --target postgres --exclude tag:skip+ tag:temporary_skip+
- run:
name: "Run Tests - Redshift"
Expand All @@ -63,9 +62,8 @@ jobs:
cd integration_tests
dbt deps --target redshift
dbt seed --target redshift --full-refresh
dbt compile --target redshift
dbt run --target redshift
dbt test --target redshift
dbt run --target redshift --exclude tag:skip+ tag:temporary_skip+
dbt test --target redshift --exclude tag:skip+ tag:temporary_skip+
- run:
name: "Run Tests - Snowflake"
Expand All @@ -75,9 +73,8 @@ jobs:
cd integration_tests
dbt deps --target snowflake
dbt seed --target snowflake --full-refresh
dbt compile --target snowflake
dbt run --target snowflake
dbt test --target snowflake
dbt run --target snowflake --exclude tag:skip+ tag:temporary_skip+
dbt test --target snowflake --exclude tag:skip+ tag:temporary_skip+
- run:
name: "Run Tests - BigQuery"
Expand All @@ -90,10 +87,19 @@ jobs:
cd integration_tests
dbt deps --target bigquery
dbt seed --target bigquery --full-refresh
dbt compile --target bigquery
dbt run --target bigquery --full-refresh
dbt test --target bigquery
dbt run --target bigquery --full-refresh --exclude tag:skip+ tag:temporary_skip+
dbt test --target bigquery --exclude tag:skip+ tag:temporary_skip+
- run:
name: "Run Tests - Databricks"
command: |
. dbt_venv/bin/activate
echo `pwd`
cd integration_tests
dbt deps --target databricks
dbt seed --target databricks --full-refresh
dbt run --target databricks --exclude tag:skip+ tag:temporary_skip+
dbt test --target databricks --exclude tag:skip+ tag:temporary_skip+
- save_cache:
key: deps1-{{ .Branch }}
Expand All @@ -115,3 +121,4 @@ workflows:
- profile-redshift
- profile-snowflake
- profile-bigquery
- profile-databricks
5 changes: 4 additions & 1 deletion .gitignore
Original file line number Diff line number Diff line change
@@ -1,4 +1,7 @@
target/
dbt_packages/
logs/
logfile
logfile
.DS_Store
package-lock.yml
integration_tests/package-lock.yml
21 changes: 21 additions & 0 deletions .vscode/settings.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,21 @@
{
"yaml.schemas": {
"https://raw.githubusercontent.com/dbt-labs/dbt-jsonschema/main/schemas/latest/dbt_yml_files-latest.json": [
"/**/*.yml",
"!profiles.yml",
"!dbt_project.yml",
"!packages.yml",
"!selectors.yml",
"!profile_template.yml"
],
"https://raw.githubusercontent.com/dbt-labs/dbt-jsonschema/main/schemas/latest/dbt_project-latest.json": [
"dbt_project.yml"
],
"https://raw.githubusercontent.com/dbt-labs/dbt-jsonschema/main/schemas/latest/selectors-latest.json": [
"selectors.yml"
],
"https://raw.githubusercontent.com/dbt-labs/dbt-jsonschema/main/schemas/latest/packages-latest.json": [
"packages.yml"
]
},
}
18 changes: 11 additions & 7 deletions integration_tests/ci/sample.profiles.yml
Original file line number Diff line number Diff line change
Expand Up @@ -2,10 +2,6 @@
# HEY! This file is used in the dbt-audit-helper integrations tests with CircleCI.
# You should __NEVER__ check credentials into version control. Thanks for reading :)

config:
send_anonymous_usage_stats: False
use_colors: True

integration_tests:
target: postgres
outputs:
Expand All @@ -27,15 +23,15 @@ integration_tests:
dbname: "{{ env_var('REDSHIFT_TEST_DBNAME') }}"
port: "{{ env_var('REDSHIFT_TEST_PORT') | as_number }}"
schema: audit_helper_integration_tests_redshift
threads: 1
threads: 8

bigquery:
type: bigquery
method: service-account
keyfile: "{{ env_var('BIGQUERY_SERVICE_KEY_PATH') }}"
project: "{{ env_var('BIGQUERY_TEST_DATABASE') }}"
schema: audit_helper_integration_tests_bigquery
threads: 1
threads: 8

snowflake:
type: snowflake
Expand All @@ -46,4 +42,12 @@ integration_tests:
database: "{{ env_var('SNOWFLAKE_TEST_DATABASE') }}"
warehouse: "{{ env_var('SNOWFLAKE_TEST_WAREHOUSE') }}"
schema: audit_helper_integration_tests_snowflake
threads: 1
threads: 8

databricks:
type: databricks
schema: dbt_project_evaluator_integration_tests_databricks
host: "{{ env_var('DATABRICKS_TEST_HOST') }}"
http_path: "{{ env_var('DATABRICKS_TEST_HTTP_PATH') }}"
token: "{{ env_var('DATABRICKS_TEST_ACCESS_TOKEN') }}"
threads: 10
11 changes: 11 additions & 0 deletions integration_tests/dbt_project.yml
Original file line number Diff line number Diff line change
Expand Up @@ -17,3 +17,14 @@ clean-targets: # directories to be removed by `dbt clean`

seeds:
+quote_columns: false

vars:
compare_queries_summarize: true
primary_key_columns_var: ['col1']
columns_var: ['col1']
event_time_var:
quick_are_queries_identical_cols: ['col1']

flags:
send_anonymous_usage_stats: False
use_colors: True
26 changes: 26 additions & 0 deletions integration_tests/macros/unit_tests/struct_generation_macros.sql
Original file line number Diff line number Diff line change
@@ -0,0 +1,26 @@
{%- macro _basic_json_function() -%}
{%- if target.type == 'snowflake' -%}
object_construct
{%- elif target.type == 'bigquery' -%}
json_object
{%- elif target.type == 'databricks' -%}
map
{%- elif execute -%}
{# Only raise exception if it's actually being called, not during parsing #}
{%- do exceptions.raise_compiler_error("Unknown adapter '"~ target.type ~ "'") -%}
{%- endif -%}
{%- endmacro -%}

{% macro _complex_json_function(json) %}

{% if target.type == 'redshift' %}
json_parse({{ json }})
{% elif target.type == 'databricks' %}
from_json({{ json }}, schema_of_json({{ json }}))
{% elif target.type in ['snowflake', 'bigquery'] %}
parse_json({{ json }})
{% elif execute %}
{# Only raise exception if it's actually being called, not during parsing #}
{%- do exceptions.raise_compiler_error("Unknown adapter '"~ target.type ~ "'") -%}
{% endif %}
{% endmacro %}

This file was deleted.

Original file line number Diff line number Diff line change
@@ -0,0 +1,11 @@
-- this has no tests, it's just making sure that the introspecive queries for event_time actually run

{{
audit_helper.compare_and_classify_query_results(
a_query="select * from " ~ ref('unit_test_model_a') ~ " where 1=1",
b_query="select * from " ~ ref('unit_test_model_b') ~ " where 1=1",
primary_key_columns=['id'],
columns=['id', 'col1', 'col2'],
event_time='created_at'
)
}}
Original file line number Diff line number Diff line change
Expand Up @@ -9,9 +9,9 @@ select
has_difference
from (

{{ audit_helper.compare_which_columns_differ(
{{ audit_helper.compare_which_relation_columns_differ(
a_relation=a_relation,
b_relation=b_relation,
primary_key="id"
primary_key_columns=["id"]
) }}
) as macro_output
Original file line number Diff line number Diff line change
@@ -0,0 +1,25 @@
{% set a_relation=ref('data_compare_which_columns_differ_a')%}

{% set b_relation=ref('data_compare_which_columns_differ_b') %}

{% set pk_cols = ['id'] %}
{% set cols = ['id','value_changes','becomes_not_null','does_not_change'] %}

{% if target.type == 'snowflake' %}
{% set pk_cols = pk_cols | map("upper") | list %}
{% set cols = cols | map("upper") | list %}
{% endif %}

select
lower(column_name) as column_name,
has_difference
from (

{{ audit_helper.compare_which_relation_columns_differ(
a_relation=a_relation,
b_relation=b_relation,
primary_key_columns=pk_cols,
columns=cols
) }}

) as macro_output
Original file line number Diff line number Diff line change
Expand Up @@ -2,96 +2,96 @@ version: 2

models:
- name: compare_queries
tests:
data_tests:
- dbt_utils.equality:
compare_model: ref('expected_results__compare_relations_without_exclude')

- name: compare_queries_concat_pk_without_summary
tests:
data_tests:
- dbt_utils.equality:
compare_model: ref('expected_results__compare_without_summary')

- name: compare_queries_with_summary
tests:
data_tests:
- dbt_utils.equality:
compare_model: ref('expected_results__compare_with_summary')

- name: compare_queries_without_summary
tests:
data_tests:
- dbt_utils.equality:
compare_model: ref('expected_results__compare_without_summary')

- name: compare_relations_with_summary
tests:
data_tests:
- dbt_utils.equality:
compare_model: ref('expected_results__compare_with_summary')

- name: compare_relations_without_summary
tests:
data_tests:
- dbt_utils.equality:
compare_model: ref('expected_results__compare_without_summary')

- name: compare_relations_with_exclude
tests:
data_tests:
- dbt_utils.equality:
compare_model: ref('expected_results__compare_relations_with_exclude')

- name: compare_relations_without_exclude
tests:
data_tests:
- dbt_utils.equality:
compare_model: ref('expected_results__compare_relations_without_exclude')

- name: compare_all_columns_with_summary
tests:
data_tests:
- dbt_utils.equality:
compare_model: ref('expected_results__compare_all_columns_with_summary')

- name: compare_all_columns_without_summary
tests:
data_tests:
- dbt_utils.equality:
compare_model: ref('expected_results__compare_all_columns_without_summary')

- name: compare_all_columns_concat_pk_with_summary
tests:
data_tests:
- dbt_utils.equality:
compare_model: ref('expected_results__compare_all_columns_concat_pk_with_summary')

- name: compare_all_columns_concat_pk_without_summary
tests:
data_tests:
- dbt_utils.equality:
compare_model: ref('expected_results__compare_all_columns_concat_pk_without_summary')

- name: compare_all_columns_with_summary_and_exclude
tests:
data_tests:
- dbt_utils.equality:
compare_model: ref('expected_results__compare_all_columns_with_summary_and_exclude')

- name: compare_all_columns_where_clause
tests:
data_tests:
- dbt_utils.equality:
compare_model: ref('expected_results__compare_all_columns_where_clause')

- name: compare_relation_columns
tests:
data_tests:
- dbt_utils.equality:
compare_model: ref('expected_results__compare_relation_columns')

- name: compare_relations_concat_pk_without_summary
tests:
data_tests:
- dbt_utils.equality:
compare_model: ref('expected_results__compare_without_summary')

- name: compare_which_columns_differ
tests:
data_tests:
- dbt_utils.equality:
compare_model: ref('expected_results__compare_which_columns_differ')

- name: compare_which_columns_differ_exclude_cols
tests:
data_tests:
- dbt_utils.equality:
compare_model: ref('expected_results__compare_which_columns_differ_exclude_cols')

- name: compare_row_counts
tests:
data_tests:
- dbt_utils.equality:
compare_model: ref('expected_results__compare_row_counts')
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
select 12 as id, 22 as id_2, 'xyz' as col1, 'tuv' as col2, 123 as col3, {{ dbt.current_timestamp() }} as created_at
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
select 12 as id, 22 as id_2, 'xyz' as col1, 'tuv' as col2, 123 as col3, {{ dbt.current_timestamp() }} as created_at
Loading

0 comments on commit d10124a

Please sign in to comment.