Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove TableComparison and convert existing calls to use dbt.tests.util #4986

Merged
merged 11 commits into from
Apr 7, 2022

Conversation

gshank
Copy link
Contributor

@gshank gshank commented Apr 1, 2022

resolves #4778

Description

Remove TableComparison and the dbt.test.tables file. Replace existing calls to TableComparison to functions in dbt.tests.util which were converted from dbt-adapter-tests. Add some comments.

I fixed a commented out assert in the dbt_run function, and had to update a bunch of tests because of that.

Checklist

  • I have signed the CLA
  • I have run this code in development and it appears to resolve the stated issue
  • This PR includes tests, or tests are not required/relevant for this PR
  • I have added information about my change to be included in the CHANGELOG.

dbt.tests.util. Also update tests for accidentally commented out
assert in dbt_run.
@gshank gshank requested a review from a team as a code owner April 1, 2022 19:46
@cla-bot cla-bot bot added the cla:yes label Apr 1, 2022
@github-actions
Copy link
Contributor

github-actions bot commented Apr 1, 2022

Thank you for your pull request! We could not find a changelog entry for this change. For details on how to document a change, see the contributing guide.

@gshank
Copy link
Contributor Author

gshank commented Apr 1, 2022

This pr is not complete--I'm still looking at the run_sql piece--but I wanted to get it out for comment.

Copy link
Contributor

@jtcohen6 jtcohen6 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@gshank Very impressive work! I'm glad to see this code rightfully retiring. It served us well enough, for a long time, and need serve no longer :)

I used this branch to run our "basic" functional testing suite against dbt-redshift, dbt-snowflake, dbt-bigquery, and dbt-databricks. All passed. This does make me yearn for an easy way to test changes in a dbt-core branch by triggering GHAs in plugins: #4988

Comment on lines 18 to 42
# Test utilities
# run_dbt
# run_dbt_and_capture
# get_manifest
# copy_file
# rm_file
# write_file
# read_file
# get_artifact
# update_config_file
# get_unique_ids_in_results
# check_results_nodes_by_name
# check_result_nodes_by_unique_id

# SQL related utilities
# run_sql_with_adapter
# relation_from_name
# check_relation_types (table/view)
# check_relations_equal
# check_relations_equal_with_relations
# check_table_does_exist
# check_table_does_not_exist
# get_relation_columns
# update_rows
# generate_update_clause
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Which of these do you feel comfortable documenting, as "public" methods? Are there any we want to keep "private"? (The current draft docs mention only run_dbt)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think 'get_artifact' should be okay to document. I'm not totally sure about 'get_manifest', mainly because it will load an internal version of the manifest that I'm not sure we want people to rely on. Most of the rest of the test utilities are pretty innocuous. I'm not totally clear on how useful they would be to users, because my imagination is that they'd be testing their own project, and most of those are helpful for modifying/updating a project.

in SQL related utilities, I think update_rows is still kind of funky, i.e. the interface is very specific to a couple of particular tests. And I'd like to look a bit more at run_sql_with_adapter (which is mostly designed to be used by the project.run_sql method). I think check_relations_equal, relation_from_name, check_relation_types, check_table_does_(not)_exist, and get_relation_columns are okay. We will be expecting adapter contributors to be dealing with those utilities. I'm not totally clear how much uptake they'd get with general users.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The principal intended user here is very much the adapter plugin maintainer, and I think that should inform how we publish/document the first cut of this interface. If it proves valuable for package/project maintainers as well, all the better.

  • Makes sense that get_manifest wants to be private, given that it's largely the output of parsing, and ought not vary across adapters
  • Agree that update_rows is not my favorite. I'd like to see all these methods refactored to use SQL wrapped in Jinja macros, rather than in Python f-strings. We can make that change independently of the test utilities, though.

# Uses:
# adapter.config.credentials
# adapter.quote
# adapter.run_sql_for_tests
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for annotating these so clearly! Did you find a need for any totally new adapter methods?

Below I see we make use of some standard attributes (credentials, quote, Relation) and standard methods (get_colums_in_relation), as well as some slightly less-standard methods (run_sql_for_tests, get_rows_different_sql).

Many adapters won't need to reimplement the default versions, but we'll definitely want to enumerated all of them in our documentation.

core/dbt/tests/util.py Show resolved Hide resolved
core/dbt/tests/util.py Show resolved Hide resolved
core/dbt/tests/util.py Outdated Show resolved Hide resolved
core/dbt/tests/util.py Show resolved Hide resolved
tests/functional/basic/test_copy_uppercase.py Show resolved Hide resolved
@gshank gshank force-pushed the ct-281-table_comparison branch 5 times, most recently from ed7b2c1 to e0cccd7 Compare April 5, 2022 01:00
@gshank gshank requested a review from a team as a code owner April 5, 2022 01:00
@gshank gshank force-pushed the ct-281-table_comparison branch from e0cccd7 to ebff78f Compare April 5, 2022 01:40
@gshank gshank force-pushed the ct-281-table_comparison branch from ebff78f to 0770ed2 Compare April 5, 2022 01:47
@gshank gshank requested a review from a team as a code owner April 5, 2022 21:06
@gshank gshank requested a review from VersusFacit April 5, 2022 21:06
Copy link
Contributor

@ChenyuLInx ChenyuLInx left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

core/dbt/tests/fixtures/project.py Outdated Show resolved Hide resolved
@@ -182,6 +228,7 @@ def adapter(unique_schema, project_root, profiles_root, profiles_yml, dbt_projec
runtime_config = RuntimeConfig.from_args(args)
register_adapter(runtime_config)
adapter = get_adapter(runtime_config)
adapter.load_macro_manifest(base_macros_only=True)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why we only load base macros here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This particular adapter is only used until the first dbt command is executed, when a new adapter is built from the full project. For the initial create_schema call and the run_sql commands that might be run in a test prior to a dbt command, the base macros should be enough. I'm trying to limit the pieces of the project that are loaded here because everything used here has to actually work, so if you want to test bad project files or profiles or macros, they will have to be loaded later instead of when the project is initially constructed.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've added some comments about it.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That makes total sense!! Thanks for explaining it!

relation = self.adapter.Relation.create(
database=self.database, schema=self.test_schema
)
self.adapter.create_schema(relation)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Love the fact that we are using more adapter function here!

@@ -28,10 +56,11 @@ def run_dbt(args: List[str] = None, expect_pass=True):

print("\n\nInvoking dbt with {}".format(args))
res, success = handle_and_check(args)
# assert success == expect_pass, "dbt exit state did not match expected"
assert success == expect_pass, "dbt exit state did not match expected"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is super helpful for prevent unknown downstream check error!

check_relations_equal(project.adapter, ["summary_expected", "materialized_summary"])
check_relations_equal(project.adapter, ["summary_expected", "view_summary"])
check_relations_equal(project.adapter, ["summary_expected", "ephemeral_summary"])
check_relations_equal(project.adapter, ["summary_expected", "view_using_ref"])
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Feels like we can merge all these checks into one check? Also okay to leave like this

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point. I've updated those tests. No point in leaving bad examples out there.

@ChenyuLInx
Copy link
Contributor

I think we need to wait for some update finish in spark and then update here before this can be merged right?

Comment on lines 22 to 23
# The main functional test fixture is the 'project' fixture, which combines
# other fixtures to write out a dbt_project in a temporary directory.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
# The main functional test fixture is the 'project' fixture, which combines
# other fixtures to write out a dbt_project in a temporary directory.
# The main functional test fixture is the 'project' fixture, which combines
# other fixtures, write out a temp dbt_project in a directory, create a temp
# schema in the testing database, and return a `TestProjInfo` object that
# contains information about that temp dbt_project.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've updated the comment.

@gshank gshank merged commit 899b0ef into main Apr 7, 2022
@gshank gshank deleted the ct-281-table_comparison branch April 7, 2022 17:04
agoblet pushed a commit to BigDataRepublic/dbt-core that referenced this pull request May 20, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[CT-281] Make obsolete the test compare tables code now in core/dbt/tests/tables.py
3 participants