Unit Testing SQL in dbt #4455
Replies: 25 comments 14 replies
-
This indeed would be useful to have for all orgs where a focus on data quality is of the utmost importance. AFAIK this is a hard problem to solve on the people / processes side of things (as you mention) and not something that has been done before for DATA unit testing. |
Beta Was this translation helpful? Give feedback.
-
Happy to find this issue! This is also an enhancement that would be useful to my team. I think this type of testing falls outside of the two existing kinds of dbt tests: schema tests and data tests. I've implemented a form of unit testing in my company's codebase. It currently executes via pytest to test pl/pgsql transformations on Postgres, but I think the technique could be adapted to other databases and dbt. Implementation sketchMy test suite folder looks like this:
The algorithm looks as follows:
The approach is probably similar to what @MichelleArk has reported with CSVs. These tests are cumbersome to setup, and haven't been able to convince my team to do this kind of testing yet. :-) I see that Dataform has unit testing. I guess one advantage of their implementation is that they are generating the test dataset in the database. Since I am defining the data in YAML, there could be issues translating data types from YAML into the database under test. |
Beta Was this translation helpful? Give feedback.
-
Hello! I would love to see this feature as part of dbt core. I created a small https://discourse.getdbt.com/t/dbt-model-think-unit-tests-poc/2160 One of the core design constraints was the ability to exercise models one at a time. This means that The framework needed to provide some mechanism for stubbing our ref/source. the approach I took with the mvp listed above was to namespace the stubbed tables with a prefix, which is set as an environmental variable. The following describes the logical steps the mvp test harness takes to stub out ref/source and provide test defined data:
this allows very focused model (“unit”) tests. Tests configure a couple of rows of stub data, exercise the model, and then assert on the output using a python dataframe. This allows for targeted, fast testing of model transformation code. I’m most likely going to move forward with this approach at dayjob. If anyone is interesting it should be relatively easy to convert this python approach to a “configuration” yaml approach. I would love to hear your thoughts. |
Beta Was this translation helpful? Give feedback.
-
Hi! I wanted continuing the conversation from #2740 I have been playing around with a way to automate this and I have a working concept here: https://github.com/jmriego/dbt-bdd The tests are run with behave which is a library for BDD testing which in my opinion is a great fit for DBT as it makes the tests easy to understand by analysts the same way it already does for ELT. Scenario: run a sample unit test Then, it will replace all ref to calendar with abcd124_calendar. This is really the main concept and I didn't find a better solution, but it does so by passing to dbt a var with the following key and value: {calendar: abcd124}. The code that detects the reference is here: https://github.com/jmriego/dbt-bdd/blob/master/macros/ref.sql I'm seeing @dm03514 you also created something similar but with pytest |
Beta Was this translation helpful? Give feedback.
-
We've been experimenting with unit tests. We've decided to (probably) use SQL mocks rather than seeds because they're faster. For any given model we have a __source.sql/s and a __expected.sql. It took me a while to come to realise that there's a fundamental paradox; either I have to deploy the model and change it's sources... or I have to have different "versions" of the model itself pointing at different sources... because the source needs to be instantiated and the model deployed before it can be tested. Ideally, though this would be easier to control with config. |
Beta Was this translation helpful? Give feedback.
-
Hi @reubster! How are you creating those SQL mocks? Do you mean that people writing the tests need to create fake source models and expected values with a SQL query similar to this?
|
Beta Was this translation helpful? Give feedback.
-
Getting (static) test data into the database is something that will be test dependent; sometimes you want small test data and then writing SQL that mocks the data is doable (select ... union all select...), medium data sets fit well inside yaml files and large files can be provisioned with regular dbt tooling practices. Since all of them have different approaches / tooling I would rather see a solution to unit testing where we can get a model's sql code (parsed) but where refs and sources (potentially variables as well) can be overridden in a test local scope. Assume such a macro exists (where we can get the compiled sql code) and that it is called Contrived example SELECT
*,
a+b AS sum
FROM {{ ref('some_other_model') }}
This example doesn't have ref/source overrides but rather does a simple string replace but you get the general idea. How the data is sourced in _ This is mostly a copy from this thread we're I've tried to get some feedback on this approach : https://discourse.getdbt.com/t/testing-with-fixed-data-set/564/9 _ |
Beta Was this translation helpful? Give feedback.
-
hi @Zatte , I really like that approach. It definitely feels more DBT-onic than what I was proposing. As you say, there might be multiple ways of filling data for testing depending on the size of the tests. Nothing stops the yaml I was proposing to generate these test sqls automatically so it's not even like these two approaches are exclusive. |
Beta Was this translation helpful? Give feedback.
-
I see the preference is to have the mock data in some file in the repo. I am curious as to why not have a different database / schema for the mock data e.g. raw_mock_data and replace sources with that maybe using a var and adding a tag to these tests so you can include/exclude them on a given run |
Beta Was this translation helpful? Give feedback.
-
I personally would like all tests to be able to run using just
I think this approach can work in many situations but not all. If you can only swap out the schema then you are limited to swapping 1:1 between production/mock data. What if you want to test a model using different mocks and/or which depends on 2 or more tables (let's call them A, B); Testing with mocks A1, A2, B1, B2, B3 and combinations of these would be difficult. |
Beta Was this translation helpful? Give feedback.
-
makes sense, thanks for clarifying. |
Beta Was this translation helpful? Give feedback.
-
Hi I'm doing unit tests in dbt with a couple of custom macro helpers, with a couple of trade-offs and not practical things.
It looks good:
But it has a couple of flaws:
So it's far from a perfect setup. I was having a look at Dataform and they have the concept of unit tests as a feature. In Dataform, for each unit test, we need to defined the model that we want to test (as in my approach) and we need to always provide the input data for each model used by the model_to_test. Rewriting my initial test in Dataform looks like this:
That's neat imo. That being said, I was impressed with the Dataform approach, and I think an approach like that is the way to go for dbt. I think this is slightly hard to put on a PR by an 'outsider'. Could someone from the dbt team please share what is the road-map for unit tests? |
Beta Was this translation helpful? Give feedback.
-
Just chiming in to link a solid slack thread, prompted by the comment above. This is a topic I'm very interested in — and would be interested in revisiting, in earnest, next year |
Beta Was this translation helpful? Give feedback.
-
Hi.
The unit test is composed by 4 separated parts:
Under the hood the unit_test macro constructs a big sql query which doesn't depend on models, just depends on the inputs. That being said we don't need to make dbt run, to refresh the models each time we want to test a new change, so the feedback loop is seconds. We solved our main problem which was mocking the sources of a model, also we improved the feedback loop, anyway, we still have a couple of ideas to improve based on our customers feedback:
We'll share the custom macros in Equal Experts GitHub. |
Beta Was this translation helpful? Give feedback.
-
Very glad to see the activity in this discussion continue! I've been thinking about this on and off for a while now. Thanks to @tommyh for taking the initiative by pulling some more thoughts out of me. This started as a DM; I figure it's better here. Update on current state:
Premises:
I sense that a trade-off exists today between:
I've seen three basic approaches presented, here and elsewhere:
I find myself drawn to option 3, even though it has some serious shortcomings. This approach optimizes for reusability, modularity, and speed. Given the need for seeds, it loses out on compactness—maybe we can find a way to stick both a seed + a model in the same file? It's also not very flexible, if it requires you to define fixture expectations for every single model—not to mention, the significant work of maintaining those fixtures going forward—that is, unless we could support a cleverer custom Those approaches are all (or mostly) possible today with dbt as it exists today. Even if the implementation isn't ideal, I think it's enough to bake them out, and decide which is worthy of standardization. When it's time to build this into One last thing: To reiterate what I said in #4707 (comment), I think unit testing model logic (SQL) and macro logic (DDL/DML/functional behavior) ought to be two different endeavors. (That's not a perfect division: there are many macros that do little more than DRY up "model logic," and fall under the purview of this initiative. I'm talking about the macros you look at and say, "This is Jinja?") I believe folks building products on top of dbt, such as package + plugin maintainers, need access to a more rigorous testing framework. I also believe that framework cold be the very same as the one we're (re)building for ourselves, to better test |
Beta Was this translation helpful? Give feedback.
-
Hi everyone! We've been improving our previous mentioned approach and we've transformed our sample project into a dbt package that can be reused across multiple projects: https://github.com/EqualExperts/dbt-unit-testing We've received feedback on the dbt slack and we introduced a couple of changes:
{% call dbt_unit_testing.test('covid19_cases_per_day') %}
{% call dbt_unit_testing.mock_source('dbt_unit_testing_staging', 'covid19_stg', {"input_format": "csv"}) %}
day::Date, payload
'2021-05-05', '[{}]'
'2021-05-06', '[{"newCases": 20}]'
{% endcall %}
{% call dbt_unit_testing.expect({"input_format": "csv"}) %}
day::Date, cases
'2021-05-05', 0
'2021-05-06', 20
{% endcall %}
{% endcall %} @jtcohen6 Thank you for your insightful feedback.
We are definitely in this space. I understand your concern about reusability. Although, since we are doing unit tests, even if you are testing adjacent models you will probably test different behaviours, and for different behaviours, the test setup might be different.
Absolutely, really appreciate it! |
Beta Was this translation helpful? Give feedback.
-
I have to agree with @cdiniz here... Taking apporach 1 from above here seems to be the option that will provide the best unit testing capabilities. Again, in agreement with @cdiniz, the other two approaches aren't really unit tests - I think the clue is in the fact that approach 1 is the only one that doesn't need the DAG to run, i.e. you are testing one model as if it were a function in isolation. If I understand correctly, approach 3 will result in you only being allowed one mock output per model to act as an input to downstream tests? This means that all of our tests that share an upstream model become coupled together. Thinking about this conversely it means you have to maintain one master test dataset for each model that will cover all tests for all children of this model. This feels brittle. For example... I've always been told when you discover a bug you should write a test to ellicit the buggy behaviour and then fix the source code. In this scenario if I find a bug in model A's logic, I now have to tweak the input fixtures to model A in such a way as to ellicit the buggy behaviour. This could then break any number of tests to models downstream of model A's parents (when the changes were to elicit behaviour in model A, which the other breaking tests may not even be dependant on). This sounds like a very high blast radius on my test suite for a bug fix. Furthermore, It's very common to write many self contained tests for the same class/function with slightly different inputs to isolate the behaviour for each individual test case. This to me is a fundamental requirement for writing properly focused/isolated unit tests and only seems possible with approach 1. Perhaps some sort of hybrid of approach 1 and 3 could be thought about? Make it straight forward to write proper unit tests with fully isolated inputs and the ability to define many inputs/expectations in a single file, but also have a clean and easy mechanism to share fixtures between test files (e.g. like |
Beta Was this translation helpful? Give feedback.
-
We released our You may notice, in reading those docs, that the entire first section is just about mocking dbt projects and sequences; it isn't specific to adapter plugins at all. It's totally possible to use this framework outside the context of developing an adapter. Really, you can use it anywhere you can install The appeal of the framework is it's very easy to define multiple fixtures (mocked project resources) in one file, to reuse them across files / test cases, and to run test cases against multiple database adapters. We just did this for macros over in So, what about unit testing models? A dbt SQL model is a kind of function: its inputs are refs + sources, its output is a tabular dataset. If all three of those can be mocked, then we can write a "unit" test. (It still requires running dbt, and a database connection—but if you've written your model to be cross-database compatible, you could conceivably run its tests against Postgres / DuckDB / SQLite instead of your actual database.) Example codeImagine I've got a "complex" model, selecting from another model and a source table, and then unioning them together: -- models/complex_model.sql
select count(*) as num from {{ source('population', 'persons') }}
union all
select count(*) as num from {{ ref('stg_persons') }} I want to, given some mocked versions of the The first two files below are just one-time setup, reusable by each unit test case. The final file is where I mock and run ### tests/conftest.py
import pytest
# Import the standard functional fixtures as a plugin
pytest_plugins = ["dbt.tests.fixtures.project"]
# The profile dictionary, used to write out profiles.yml
@pytest.fixture(scope="class")
def dbt_profile_target():
return {
'type': 'postgres',
'threads': 1,
'host': "localhost",
'port': 5432,
'user': ...,
'pass': ...,
'dbname': ...,
} ### tests/base_unit.py
import pytest
from dbt.tests.util import run_dbt, check_relations_equal
# This is pretty tricky Jinja, but the idea is just to "override" ref/source by repointing
# to the mocked seeds/models defined in the test case. The mapping is handled by
# 'mock_ref()' and 'mock_source()' methods defined on the test case
mock_ref_source = """
{{% macro ref(ref_name) %}}
{{% set mock_ref = {} %}}
{{% set mock_name = mock_ref.get('ref_name', ref_name) %}}
{{% do return(builtins.ref(mock_name)) %}}
{{% endmacro %}}
{{% macro source(source_name, table_name) %}}
{{% set lookup_name = source_name ~ '__' ~ table_name %}}
{{% set mock_src = {} %}}
{{% set mock_name = mock_src[lookup_name] %}}
{{% do return(builtins.ref(mock_name)) %}}
{{% endmacro %}}
"""
# this isn't a test itself, it's just the "base case" for actual tests to inherit
class BaseUnitTestModel:
def actual(self):
return "actual"
def expected(self):
return "expected"
def mock_ref(self):
return {}
def mock_source(self):
return {}
@pytest.fixture(scope="class")
def macros(self):
return {
"overrides.sql": mock_ref_source.format(str(self.mock_ref()), str(self.mock_source()))
}
# The actual sequence of dbt commands and assertions
# pytest will take care of all "setup" + "teardown"
def test_mock_run_and_check(self, project):
run_dbt(["build"])
# this runs a pretty fancy query to validate: same columns, same types, same row values
check_relations_equal(project.adapter, [self.actual(), self.expected()]) Now that the setup is done, I can define just the fixtures I need for unit-testing my "complex" model:
### tests/test_unit_test_complex_model.py
import pytest
from dbt.tests.util import read_file
from tests.base_unit import BaseUnitTestModel
# Define mocks via CSV (seeds) or SQL (models)
mock_stg_persons_csv = """id,name,some_date
1,Easton,1981-05-20T06:46:51
2,Lillian,1978-09-03T18:10:33
""".lstrip()
mock_source_population_persons = """
select 1 as id, 'Easton' as name, '1981-05-20T06:46:51' as some_date
union all
select 2 as id, 'Lillian' as name, '1978-09-03T18:10:33' as some_date
"""
expected_csv = """num
2
2
""".lstrip()
actual = read_file('models/complex_model.sql')
class TestUnitTestComplexModel(BaseUnitTestModel):
# everything that goes in the "seeds" directory (= CSV format)
@pytest.fixture(scope="class")
def seeds(self):
return {
"stg_persons.csv": mock_stg_persons_csv,
"expected.csv": expected_csv,
}
# everything that goes in the "models" directory (= SQL)
@pytest.fixture(scope="class")
def models(self):
return {
"source_population_persons.sql": mock_source_population_persons,
"actual.sql": actual,
}
# repoint 'source()' calls to mocks (seeds or models)
def mock_source(self):
return {
"population__persons": "source_population_persons",
}
# not necessary, since the mocked model has the same name, but here for illustration
def mock_ref(self):
return {
"stg_persons": "stg_persons",
} And:
Is this the thing?I don't know. It feels close, but the need to run I wrote a bit about this in the Core roadmap a few days ago:
|
Beta Was this translation helpful? Give feedback.
-
Using @Zatte 's approach as a start and then extending it, this is how I could see unit tests being implemented @jtcohen6. IMHO, by basically 'borrowing' functionality from dbt macro's, seeds and singular tests. we should be able to get quite far. My modified version of his example then boils down to 'something like this': models/foobar.sql
|
Beta Was this translation helpful? Give feedback.
-
Hi all 👋 , I released a pytest plugin to unit test dbt macros. It's on PyPi: import pytest
from dbt.clients.jinja import MacroGenerator
from pyspark.sql import SparkSession
@pytest.mark.parametrize(
"macro_generator", ["macro.spark_utils.get_tables"], indirect=True
)
def test_create_table(
spark_session: SparkSession, macro_generator: MacroGenerator
) -> None:
expected_table = "default.example"
spark_session.sql(f"CREATE TABLE {expected_table} (id int) USING parquet")
tables = macro_generator()
assert tables == [expected_table] And: import pytest
from dbt.clients.jinja import MacroGenerator
from pyspark.sql import SparkSession
@pytest.mark.parametrize(
"macro_generator",
["macro.my_project.to_cents"],
indirect=True,
)
def test_dollar_to_cents(
spark_session: SparkSession, macro_generator: MacroGenerator
) -> None:
expected = spark_session.createDataFrame([{"cents": 1000}])
to_cents = macro_generator("price")
out = spark_session.sql(
"with data AS (SELECT 10 AS price) "
f"SELECT cast({to_cents} AS bigint) AS cents FROM data"
)
assert out.collect() == expected.collect()
Some useful links: This approach expects you to know Python and pytest, as mentioned above, this is a limitation. I would like to see how more technical users would like to unit tests their dbt SQL. Maybe some patterns become apparent that could be used for designing a SQL + jinja approach to unit testing. To limit the scope of the project, you can only unit tests macros for now. I felt that macros where the simplest to start with. There is an issue on unit testing models. If you have input on this, please add it to that issue! I am curious to see if some of you find this package useful! If you have feedback, you can create an issue - or PR - on the project. Also, you find me in the dbt slack as "Cor (GoDataDriven)". |
Beta Was this translation helpful? Give feedback.
-
@jtcohen6 I'm just checking it to ask if there have been any developments in this area and whether adding unit-testing capabilities is on the roadmap?? |
Beta Was this translation helpful? Give feedback.
-
Hey everyone, I've been trying to integrate dbt with Pytest but I'm struggling to get it working. Have we got any new material or examples on how to do this now? Thanks, |
Beta Was this translation helpful? Give feedback.
-
I've been using EqualExperts/dbt-unit-testing (thanks! @cdiniz 🏅) to great delight. I find it to be a very good approach for testing models. (Sadly) I've been writing a lot of business logic in Jinja macros. The desire to unit test macros and enable TTD took me down this path, resulting in this proposal to add Writing unit tests for macros could feel something like this, completely within dbt, no Python, doable from dbt Cloud.
Working examples and implementation ideas in EqualExperts/dbt-unit-testing#122 |
Beta Was this translation helpful? Give feedback.
-
We decided to work based on the Shopify example: https://www.youtube.com/watch?v=dlFYP7EJiUU&t=29s. Previously I was also just doing a select from some dummy data while developing and when finished replacing it with the actual ref so that made the most sense. In the transition from mainly C# to Dbt I kind of missed the way that simple software unit tests work where you can test cases that don't exist in the data yet. And in essence an sql query is nothing more then a function that can be mocked. For me going the Python way outside of Dbt is the closest thing to readable tests. If we were to implement a system with macros or a thousand csv files just for testing it would get out of hand real quick and convoluted. Also the whole adapter thingy felt a little bit too much. An starting point of this here: (implemented for Snowflake, using pytest). Might need some additions for other databases or specific use cases I have not encountered yet. Main goal is readability of tests and the options to use things like Faker to create fake data easily. `
|
Beta Was this translation helpful? Give feedback.
-
Hello, it’s me again! It may have taken a few years, but I’m so excited to be revisiting this discussion and share some updated thinking on the problem on behalf of the dbt-core team. We’re going to start tackling this problem in the next month (!), and have opened a new discussion here in order to keep feedback more organized and digestible. The new discussion outlines our proposal for a unit testing framework native to dbt, and we’d love to hear feedback or suggestions on it from all the testing enthusiasts who have participated in this discussion, and in the broader discussions about unit testing in the modern data stack over the past few years. |
Beta Was this translation helpful? Give feedback.
-
Describe the feature
In addition to the existing data test support dbt provides, it would be great if users had the capability to write unit tests to assert model behaviour generally and in edge cases. These would validate expected behaviour of models for data that isn't yet observed in production.
To do this, dbt would need to provide the ability to run models on a set of static inputs which could either be created at query-time or ahead of time.
We prototyped a solution where users encode static input in CSV files, configure a 'tests.yml' file that provides mappings between
source
/ref
models and CSV files, as well as specifying an expected output (also encoded as a CSV file). Our framework then generated a query that created a CTE for each static input, a CTE that represented the model being tested (replacing source/ref macros with the static input CTE names), and lastly ran a diff between the expected model and the model generated using static inputs. This generated query was then fed todbt test
- if the diff returned 0 results, the test would pass.Feedback from data scientists was that encoding static inputs in CSV files was cumbersome, readability of tests was poor because of the many disparate files representing a test case, and flexibility to programmatically encode static inputs and write custom expectations beyond equality was also desired.
Wondering if other dbt users have tried to achieve something similar, and how the community feels it's best to approach unit testing in dbt.
Describe alternatives you've considered
We have considered running dbt built-in data tests and running them on a small sample of production data locally. However, creating a representative sample of data for all edge cases for all downstream models is a challenging task and also bad practice - unit tests should have a single reason to fail. Creating many small tables representing individual test cases could be done to counter this but our main concern was where/how these static datasets were encoded - if they are in separate (let's say CSV) files, this creates a readability issue where reviewers / users have to jump between multiple files to understand a test case.
Another more general issue with this approach is that writing assertions for unit tests feels quite unnatural in SQL - its tricky even to get the right semantics for an equality check.
Additional context
There are definitely aspects of this that are database-specific. For example, in BigQuery, we can create static inputs as CTEs using ARRAYs of STRUCT types. For other databases, a different syntax or more preferred method of creating static data for testing. In addition, to create static inputs in BigQuery as ARRAYs of SRUCTs the data type of each column needs to be specified.
Who will this benefit?
I think all dbt users would benefit, especially large organizations where there will be frequent updates and many collaborators for a single model. Unit testing will give users more confidence that the changes they are making will not break existing behaviour.
Beta Was this translation helpful? Give feedback.
All reactions