To everyone we saw at Coalesce last month: thank you for joining us! We got up on stage and shared the next chapters from this year’s featured stories: about collaborating across multiple teams and projects at scale; about relaunching the dbt Semantic Layer; about more flexibility in development; and about more mature CI/CD. To anyone who missed us live, catch us on the replays!
These are stories that span both dbt Core and dbt Cloud. Our aim is to push forward the open source standard for analytics engineering, and also the platform that makes it possible for more teams to adopt & deploy dbt at scale.
In his keynote presentation, Tristan talked about these two priorities for dbt Labs. We remain committed to dbt Core, as a standard for the industry, and an open source project under an Apache 2 license. We are also committed to creating a business around dbt Cloud that is sustainable over the long term, to enable us to continue to invest in driving dbt forward.
Those two goals are inseparable. To make them both happen, we need to strike an important balance. What has it looked like over the last six months, and what will it look like for the six months ahead?
*Also, hi! I’m Grace Goheen, or @graciegoheen. Long time dbt user, new to the dbt Core product team. I joined the Professional Services team at dbt Labs back in 2021, where I’ve since had the opportunity to work hands-on in dozens of dbt projects - leading legacy migrations, consulting on architecture, optimizing project performance, and more. I lived through the joy (lineage! testing! documentation!) and pain (spaghetti DAGs! model bottlenecks! debugging code!) of being an analytics engineer, and realized I wanted to be a part of shaping the tool at the center of it all. So here I am, the newest Product Manager of dbt Core! I am so grateful to be building this industry-defining tool with all of you.
Version | When | Namesake | Stuff |
---|---|---|---|
v1.5 | April | Dawn Staley | Revamped CLI. Programmatic invocations. Model governance features (contracts, access, groups, versions). |
v1.6 | July | Quiara Alegría Hudes | New Semantic Layer spec. More on model governance (deprecations). Saving time and $$ with retry + clone. Initial rollout of materialized views. |
v1.7 | November | Questlove | More flexible access to "applied state" in docs generate and source freshness. Improvements to model governance & semantic layer features (driven by user feedback). |
We added a lot of stuff this year! Over the past three dbt Core minor versions, we’ve managed to knock out a litany of the most popular issues and discussions gathered over the past several years:
- CLI preview (1.5)
- Invoking dbt as a Python module (1.5)
- Materialized views (1.6)
- Namespacing for dbt resources (1.6), in support of multi-project collaboration
docs generate --select
for slimmer catalog queries (1.7)- And… we’re finally taking aim at unit testing for dbt-SQL models (!), coming in 1.8, which you should read more about in the section below.
Thank you for providing your upvotes, comments, and feedback. One of the best things about building dbt Core in the open is that we are all pushing forward the analytics engineering standard together. We’re able to prioritize these features and paper cuts because of your participation.
We’ve got lots more to build - there are some highly upvoted issues and discussions that remain, and gaps in the analytics engineering workflow that we want to close. But before we keep building, we must ensure our foundation is a stable one.
Version | When | Stuff | Confidence |
---|---|---|---|
1.8 | Spring 2024 | Stable interfaces for adapters & artifacts. Built-in support for unit testing dbt models. | 80% |
Since the v1.0 release of dbt Core (almost two years ago), we’ve released a minor version of dbt Core every three months. The January release (post-Coalesce, post-holidays) tends to be an understated affair: tech debt, bug fixes, support for Python 3-dot-new.
We’ve been measuring the rate of adoption for new versions, and we’ve seen that it takes more than 3 months for the wider user base to really adopt them. The plurality of dbt projects in the world are using a dbt Core version released between 6 and 12 months ago. We think this speaks to two things: It’s harder to upgrade than it should be.; and we can afford to take more time baking new releases.
Between now and next April (2024), we plan to instead prepare one minor release that prioritizes all-around interface stability. We want to make it easier for everyone to upgrade with confidence, regardless of their adapter or other integrated tooling. There is a lot of value locked up in the features we’ve already released in 2023, and we want to lower the barrier for tens of thousands of existing projects who are still on older versions. That work is important, it takes time, and it has long-lasting implications.
With the v1.0 release, we committed to minimizing breaking changes to project code, so that end users would be able to upgrade more easily. We haven’t perfected this, including earlier this year when we did a full relaunch of the metrics spec for the Semantic Layer. We are committed to getting better here.
Even in v1.0, though, we intentionally carved out two less-stable interfaces, which would continue to evolve in minor releases: adapter plugins and metadata artifacts. At the time, these interfaces were newer and rapidly changing. Almost every minor version upgrade, from v1.0 through v1.7, has required some fast-follow compatibility changes for adapters and for tools that parse dbt manifests.
This has been particularly difficult for adapter maintainers. As of this writing, while the majority of third-party adapters support v1.4 (released in January), just over a third support v1.5 (April), and only a handful support v1.6 (July). It isn’t fair of us to keep releasing in a way that requires this reactive compatibility work every 3 months. Instead, we will be defining a stable interface for adapters, in a separate codebase and versioned separately from dbt Core. Starting in v1.8, it will be forward-compatible for future versions. If you want to use dbt-core
v1.X with dbt-duckdb
v1.Y, you will be able to.
For most people, we don’t want you to have to think about versions at all: just use latest & greatest dbt Core. For customers and users of dbt Cloud, this is the experience we want to provide: delivering dbt Core and dbt Cloud together, as one integrated and continuously delivered SaaS application — an experience where you don’t need to think about versions or upgrading, and where you get access to Cloud-enhanced & Cloud-only features as a matter of course.
An aside: This was the first year in which we delivered some functionality like that: built it in such a way that it feels like Core while being actually powered by (and exclusive to) dbt Cloud. This has long been our pattern: Core defines the spec, and Cloud the scalable implementation, especially for Enterprise-geared functionality.
I (Jeremy) wish I had communicated this delineation more clearly, and from the start. We are going to continue telling unified stories, across mature capabilities in Core and newer ones in Cloud, and we want all of you — open source community members, Cloud customers, longtime data practitioners and more-recent arrivals — to know that you are along for this journey with us.
Over the next 6-12 months, we will be spending less time on totally new constructs in dbt Core, and more time on the fundamentals that are already there: stabilizing, maintaining, iterating, improving.
dbt Cloud customers will see enhancements and under-the-hood improvements delivered continuously, as we move towards this model of increased stability. Features that fit inside dbt Core’s traditional scope will also land in a subsequent minor version of dbt Core.
This is an important part of our evolving story: a compelling commercial offering that makes it possible for us to keep developing, maintaining, and distributing dbt Core as Apache 2 software.
dbt Core is as it has always been: an open source standard. It’s a framework, a coherent set of ideas, and a fully functional standalone tool that anyone can take for a spin — adopt, extend, integrate, imitate — without ever needing to ask us for permission. Adapters will keep moving at the pace of innovation for their respective data warehouse. dbt Docs remains a great "single-player" experience for getting hooked on data documentation. (The aesthetic isn’t dated, it’s retro.) dbt Core remains the industry-defining way to author analytical models and ensure their quality in production.
But wait!
As many of you have voiced, there’s been no good way to ensure your SQL logic is correct without running expensive queries against your full production data. dbt does not have native unit testing functionality… yet. This gap in the standard is one we have been eager to work on, and we’re planning to land it in the next minor release of dbt Core.
For many years, dbt has supported "data" tests — testing your data outputs (dbt models, snapshots, seeds, etc.) based on that environment’s actual inputs (dbt sources in your warehouse), and ensure the resulting datasets match your defined expectations.
Soon, we’re introducing "unit" tests — testing your modeling logic, using a small set of static inputs, to validate that your code is working as expected, faster and cheaper.
Thank you to everyone who has already provided feedback and thoughts on our unit testing discussion — or, we should say our new unit testing discussion, since Michelle opened the original one back in 2020, before she joined dbt Labs :)
We truly appreciate the amount of insights and energy y’all have already poured into helping us make sure we build the right thing.
We are actively working on this feature and expect it to be ready for you all in our 1.8
release next year! If you have thoughts or opinions, please keep commenting in the discussion. We’re also planning a community feedback session for unit testing once we’ve released an initial beta of 1.8
, so keep an eye out.
We will continue to respond to your issues and review your PRs. We will continue to resolve regressions and high-priority bugs, fast as we’re able, and include those fixes in regular patch releases.
Along with fixing bugs and regressions, we’d also like to keep tackling some of the highly requested "paper cuts”. Thank you to all those who have expressed their interest by upvoting and commenting.
We’re unlikely to tackle all of these things in v1.8 — they’re lower-priority than the interface stability work, which we must do — they are all legitimate opportunities to solidify the existing, well-established Core functionality:
- Allow data tests to be documented
- Snapshot paper cuts
- Making external tables native to dbt-core
- Defining vars, folder-level configs outside
dbt_project.yml
- Supporting additional formats for seeds (JSON)
Let us know which ones speak to you — in that list, not in that list, the ideas in your head — on GitHub, on Slack, or wherever you may find us.