Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[CT-707] Expose check_relations_equal + consolidate with dbt-utils generic tests #5318

Open
jtcohen6 opened this issue Jun 1, 2022 · 0 comments
Labels
Team:Adapters Issues designated for the adapter area of the code tech_debt Behind-the-scenes changes, with little direct impact on end-user functionality utils Cross-database building blocks

Comments

@jtcohen6
Copy link
Contributor

jtcohen6 commented Jun 1, 2022

To assert that two relations are identical, we have similar logic expressed in a few places today:

  1. Custom generic tests in dbt-utils: the equality, cardinality_equality, and (to a lesser extent) equal_rowcount
  2. COLUMNS_EQUAL_SQL + get_rows_different_sql, which are used extensively (but exclusively) in our functional testing framework today:
    def get_rows_different_sql(
    self,
    relation_a: BaseRelation,
    relation_b: BaseRelation,
    column_names: Optional[List[str]] = None,
    except_operator: str = "EXCEPT",
    ) -> str:
    """Generate SQL for a query that returns a single row with a two
    columns: the number of rows that are different between the two
    relations and the number of mismatched rows.
    """
    # This method only really exists for test reasons.
    names: List[str]
    if column_names is None:
    columns = self.get_columns_in_relation(relation_a)
    names = sorted((self.quote(c.name) for c in columns))
    else:
    names = sorted((self.quote(n) for n in column_names))
    columns_csv = ", ".join(names)
    sql = COLUMNS_EQUAL_SQL.format(
    columns=columns_csv,
    relation_a=str(relation_a),
    relation_b=str(relation_b),
    except_op=except_operator,
    )
    return sql
    COLUMNS_EQUAL_SQL = """
    with diff_count as (
    SELECT
    1 as id,
    COUNT(*) as num_missing FROM (
    (SELECT {columns} FROM {relation_a} {except_op}
    SELECT {columns} FROM {relation_b})
    UNION ALL
    (SELECT {columns} FROM {relation_b} {except_op}
    SELECT {columns} FROM {relation_a})
    ) as a
    ), table_a as (
    SELECT COUNT(*) as num_rows FROM {relation_a}
    ), table_b as (
    SELECT COUNT(*) as num_rows FROM {relation_b}
    ), row_count_diff as (
    select
    1 as id,
    table_a.num_rows - table_b.num_rows as difference
    from table_a, table_b
    )
    select
    row_count_diff.difference as row_count_difference,
    diff_count.num_missing as num_mismatched
    from row_count_diff
    join diff_count using (id)
    """.strip()

Straightforward change

Could we consolidate the logic in both those places, so that we're not duplicating the same SQL (and requiring adapter maintainers to do the same)?

Related improvements

Some good feedback on the latter, in #4455 (reply in thread) — trying to answer the question, "Could/should the functional testing framework also enable unit testing models/macros for end users?" — that the output of check_relations_equal assertion statements could be more useful:

When a test does fail we get this message:
AssertionError: Got 1 different rows between DEV_CEREBRO.test16539256090777326094_test_complex_model.actual and DEV_CEREBRO.test16539256090777326094_test_complex_model.expected which isn't informative enough to workout what's different between the expected and actual

@jtcohen6 jtcohen6 added tech_debt Behind-the-scenes changes, with little direct impact on end-user functionality Team:Adapters Issues designated for the adapter area of the code labels Jun 1, 2022
@github-actions github-actions bot changed the title Expose check_relations_equal + consolidate with dbt-utils generic tests [CT-707] Expose check_relations_equal + consolidate with dbt-utils generic tests Jun 1, 2022
@jtcohen6 jtcohen6 added the utils Cross-database building blocks label Jul 25, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Team:Adapters Issues designated for the adapter area of the code tech_debt Behind-the-scenes changes, with little direct impact on end-user functionality utils Cross-database building blocks
Projects
None yet
Development

No branches or pull requests

1 participant