Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add column_name to output of compare_column_values #47

Merged
merged 2 commits into from
Jul 25, 2022

Conversation

leoebfolsom
Copy link
Contributor

@leoebfolsom leoebfolsom commented Jun 29, 2022

Description & motivation

This small change, the addition of a column_name column to the output of the compare_column_values macro, enables a user to write a relatively simple dbt test that compares values of all columns of a given table. Here is the dbt test I wrote, which is similar to an example that I found in the README.

This tests that a table called deal_facts with a primary key deal_id has the same column values in two different schemas ("prod" and whatever the target/dev schema is).

{%- set columns_to_compare=adapter.get_columns_in_relation(ref('deal_facts'))  -%}

{% set old_etl_relation_query %}
    select * from warehouse.deal_facts
{% endset %}

{% set new_etl_relation_query %}
    select * from {{ ref('deal_facts') }}
{% endset %}

{% if execute %}
    
    {% for column in columns_to_compare %}

        {{ log('Comparing column "' ~ column.name ~'"', info=True) }}

        {% set audit_query = audit_helper.compare_column_values(
            a_query=old_etl_relation_query,
            b_query=new_etl_relation_query,
            primary_key="deal_id",
            column_to_compare=column.name
        ) %}

        {% set audit_results = run_query(audit_query) %}
        {% do audit_results.print_table() %}
        {{ log("", info=True) }}

        /*
        Create a query combining results from all columns so that the user, or the 
        test suite, can examine all at once.
        */
        {% if loop.first %}
        /*
        Create a CTE that wraps all the
        unioned subqueries that are created
        in this for loop
        */
        with main as ( 
        {% endif %}
        /*
        There will be one audit_query subquery for each column
        */
        ( {{ audit_query }} )
        {% if not loop.last %}
          union
        {% else %}
        ) select * from main 
        /* Identify records that are not perfect matches. 
        These are dbt test failures.
        */
        where match_status != '\u2705: perfect match' and count_records > 0 
        
        {% endif %}

    {% endfor %}

{% endif %}

The test fails if any column has rows in either table that aren't a ✅ perfect match.

The test also writes output to the logs, just like the example in the README.

This approach be extended to enable testing across multiple/all tables within a single test, but I thought starting small made sense.

If the team is interested in merging this change, I'd be happy to update my PR with some guidance in the README (although I don't think this update would break anything for current use cases).

A future enhancement could include a macro that specifically accomplishes this based on the test I've mocked up, with the addition of arguments for columns to exclude.

My opinion is that even this simple change in this PR would unlock a lot, and a macro to the same end is a "nice to have," since anyone could write a version of this test to meet their specific use case once they have column_name at their disposal.

Apologies if I'm missing something obvious that renders this idea moot/unncessary!

Issue: #46

Checklist

  • I have verified that these changes work locally
  • I have updated the README.md (if applicable)
  • I have added tests & descriptions to my models (and macros if applicable)

@leoebfolsom leoebfolsom changed the title compare all columns add column_name to output of compare_column_values Jun 29, 2022
Copy link
Contributor

@joellabes joellabes left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep I like it! Very nice very good.

Given your test case, there's an argument to be made for a "is_successful" column or something so that you can filter on a bool instead of a specific emoji (but it's cool that that works!). Something for another day though :shipit:

@joellabes joellabes merged commit 56b7ed6 into dbt-labs:main Jul 25, 2022
@leoebfolsom leoebfolsom deleted the lf/compare-all-columns branch July 26, 2022 17:22
@leoebfolsom
Copy link
Contributor Author

Thanks @joellabes ! Noted on the boolean column (or possibly multiple boolean columns)--I know you've commented further on that in #50 🙌

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants