add column_name to output of compare_column_values #47

leoebfolsom · 2022-06-29T21:09:26Z

Description & motivation

This small change, the addition of a column_name column to the output of the compare_column_values macro, enables a user to write a relatively simple dbt test that compares values of all columns of a given table. Here is the dbt test I wrote, which is similar to an example that I found in the README.

This tests that a table called deal_facts with a primary key deal_id has the same column values in two different schemas ("prod" and whatever the target/dev schema is).

{%- set columns_to_compare=adapter.get_columns_in_relation(ref('deal_facts'))  -%}

{% set old_etl_relation_query %}
    select * from warehouse.deal_facts
{% endset %}

{% set new_etl_relation_query %}
    select * from {{ ref('deal_facts') }}
{% endset %}

{% if execute %}
    
    {% for column in columns_to_compare %}

        {{ log('Comparing column "' ~ column.name ~'"', info=True) }}

        {% set audit_query = audit_helper.compare_column_values(
            a_query=old_etl_relation_query,
            b_query=new_etl_relation_query,
            primary_key="deal_id",
            column_to_compare=column.name
        ) %}

        {% set audit_results = run_query(audit_query) %}
        {% do audit_results.print_table() %}
        {{ log("", info=True) }}

        /*
        Create a query combining results from all columns so that the user, or the 
        test suite, can examine all at once.
        */
        {% if loop.first %}
        /*
        Create a CTE that wraps all the
        unioned subqueries that are created
        in this for loop
        */
        with main as ( 
        {% endif %}
        /*
        There will be one audit_query subquery for each column
        */
        ( {{ audit_query }} )
        {% if not loop.last %}
          union
        {% else %}
        ) select * from main 
        /* Identify records that are not perfect matches. 
        These are dbt test failures.
        */
        where match_status != '\u2705: perfect match' and count_records > 0 
        
        {% endif %}

    {% endfor %}

{% endif %}

The test fails if any column has rows in either table that aren't a ✅ perfect match.

The test also writes output to the logs, just like the example in the README.

This approach be extended to enable testing across multiple/all tables within a single test, but I thought starting small made sense.

If the team is interested in merging this change, I'd be happy to update my PR with some guidance in the README (although I don't think this update would break anything for current use cases).

A future enhancement could include a macro that specifically accomplishes this based on the test I've mocked up, with the addition of arguments for columns to exclude.

My opinion is that even this simple change in this PR would unlock a lot, and a macro to the same end is a "nice to have," since anyone could write a version of this test to meet their specific use case once they have column_name at their disposal.

Apologies if I'm missing something obvious that renders this idea moot/unncessary!

Issue: #46

Checklist

I have verified that these changes work locally
I have updated the README.md (if applicable)
I have added tests & descriptions to my models (and macros if applicable)

…d column name as string value

joellabes

Yep I like it! Very nice very good.

Given your test case, there's an argument to be made for a "is_successful" column or something so that you can filter on a bool instead of a specific emoji (but it's cool that that works!). Something for another day though

leoebfolsom · 2022-07-26T17:24:43Z

Thanks @joellabes ! Noted on the boolean column (or possibly multiple boolean columns)--I know you've commented further on that in #50 🙌

Leo Folsom added 2 commits June 29, 2022 12:24

add a column column to the compare_column_values macro output

9cc99dc

resolve issues with reserved word (column), add quotation marks aroun…

36de39a

…d column name as string value

leoebfolsom changed the title ~~compare all columns~~ add column_name to output of compare_column_values Jun 29, 2022

joellabes approved these changes Jul 25, 2022

View reviewed changes

joellabes merged commit 56b7ed6 into dbt-labs:main Jul 25, 2022

This was referenced Jul 25, 2022

add column name to compare_column_values output #46

Closed

lf/issue-49 compare all columns macro for testing #50

Merged

leoebfolsom deleted the lf/compare-all-columns branch July 26, 2022 17:22

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

add column_name to output of compare_column_values #47

add column_name to output of compare_column_values #47

leoebfolsom commented Jun 29, 2022 •

edited

Loading

joellabes left a comment

leoebfolsom commented Jul 26, 2022

add column_name to output of compare_column_values #47

add column_name to output of compare_column_values #47

Conversation

leoebfolsom commented Jun 29, 2022 • edited Loading

Description & motivation

Checklist

joellabes left a comment

Choose a reason for hiding this comment

leoebfolsom commented Jul 26, 2022

leoebfolsom commented Jun 29, 2022 •

edited

Loading