-
Notifications
You must be signed in to change notification settings - Fork 42
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add new macros for diff calculation, and unit tests #99
Merged
joellabes
merged 61 commits into
dbt-labs:joellabes-audit-helper-revamp
from
joellabes:master
May 27, 2024
Merged
Add new macros for diff calculation, and unit tests #99
joellabes
merged 61 commits into
dbt-labs:joellabes-audit-helper-revamp
from
joellabes:master
May 27, 2024
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
2 tasks
joellabes
merged commit May 27, 2024
9da3c51
into
dbt-labs:joellabes-audit-helper-revamp
1 check passed
joellabes
added a commit
that referenced
this pull request
Jun 13, 2024
* Add new macros for diff calculation, and unit tests (#99) * Add macro for new hash-based comparison strategy * split out SF-focused version of macro * Fix change to complex object * Fix overuse of star * switch from compare rels to compare queries * provide wrapping parens * switch to array of columns for PK * split unit tests into own files, change unit tests to array pk * tidy up get_comp_bounds * fix arg rename * add quick_are_queries_identical and unit tests * Move data tests into own directory * Add test for multiple PKs * fix incorrect unit test configs * make data types for id and id_2 big enough nums * Mock event_time response * fix hardcoded value in quick_are_qs_identical * Add unit tests for null handling (still broken) * Rename columsn to be more unique * Steal surrogate key macro from utils * Use generated surrogate key across the board in place of PK * rm my profile reference * Update quick_are_queries_identical.sql * Add diagram explaining comparison bounds * Add comments explaining warehouse-specific optimisations * cross-db support * subq * no postgres or redshift for a sec * add default var values for compare wrappers * avoid lateral alias reference for BQ * BQ doesn't support count(arg1, arg2) * re-enable redshift * Alias subq for redshift * remove extra comma * add row status of nonunique_pk * remove redundant test and wrapper model * Create json-y tests for snowflake * Add workaround for redshift to support count num rows in status * skip incompatible tests * Fix redshift lack of bool_or support in window funcs * add skip exclusions for everything else * fix incorrect skip tag application * Move user configs to project.yml from profiles * Temporarily disable unpassable redshift tests * add temp skip to circle's config.yml * forgot tag: method * Temporarily skip reworked_compare_all_statuses_different_column_set * Skip another test redshift * disable unsupported tests BQ * postgres too? * Fixes for postgres * namespace macros * It's a postgres problem, not a redshift problem * Handle postgres 63 char limit * Add databricks * Rename tests to data_tests * Found a better workaround for missing count distinct window * actually call the macro * disable syntax-failing tests on dbx * try to install core from main to get sorting fix * Revert "try to install core from main to get sorting fix" This reverts commit d28f3e1. * Audit helper code review changes * add BQ support for qucik are queries identical * explain why using dense_rank * remove the compile step to avoid compilation error * Don't throw incompatible quick compare error during parse * add where clause to check we're not assuming its absence * enable first basic struct tests * Skip raising exception during parsing * json_build_object doesn't work on rs * changed behaviour redshift * skip complex structs on rs for now * temp disable all complex structs * skip some currently failoing bq tests * Properly exclude tests to skip, add comments * dbx too * rename reworked_compare to compare_and_classify_query_results * Rename files * rename macro file * Add relation_focused macros * Add BQ-specific generate_set_results for hashes, enable json tests * Implement hash comparisons for BQ and DBX (#103) * disable tests for unrelated adapters * Avoid lateral column aliasing * First cross-db complex struct fixture * Add final fixtures * Initial work on dbx compatibility * remove lateral column alias dbx * cast everything as string before hashing * add comment, enable all tests again * rename to dbt_audit_in_a instead of in_a * Protect against missing PK columns * gitignore package-lock.yml * add dbx variant of simple structs * Rename private macros to have _ prefix * Fix get comparison bounds (#104) * change to getting comparison bounds for queries not relations * add test for introspective queries * Make compare query columns multi pk (#105) * rm packagelock.yml
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Description & motivation
It's possible to calculate diffs faster by using hashes (as described by the IL team here). Additionally, by calculating aggregate results and outputting them alongside a subset of summary results, it's possible to skip running other queries at all.
I also added a bunch of unit tests! They're really good, more people should be talking about this.
Checklist