Recipe for write-audit-publish / blue-green deploy #4648

dbeatty10 · 2023-12-14T01:30:53Z

Contact Details

dbeatty10

I have read the dbt Developer Blog contribution guidelines.

I have read the dbt Developer Blog contribution guidelines.

Which of these best describes you?

I am a dbt Community member or partner contributing to the Developer Blog
I work for dbt Labs and am creating this issue for a community or marketing approved piece.

What is the topic of your post?

We talked about blue/green deployments in this Developer Blog post:

To defer or to clone, that is the question

But we haven't (yet!) provided some hands-on recipes for blue/green deployments that utilize dbt clone; we just linked here instead.

Link to an initial outline.

Wanna run your data tests before exposing your models to downstream consumers?

Then a blue/green deploy (also known as the write-audit-publish paradigm) is what you are looking for, and dbt clone is one way to accomplish it.

A blue/green deployment using dbt clone can build your models in a "blue" environment, run the data tests against it, and then surface the output in a "green" environment.

Recipes

See below for some (rough) recipes we can play around with which are all variants on a similar theme.

Basically, we can use a regular dbt build that targets a (stand-alone and immutable) staging environment, and then use dbt clone to update production references to point back to it.

I don't know which is "best" -- these are just a handful of different approaches that I've tried with toy projects. They would need to more fully assessed and put through their paces before applying to production workloads.

Recipe 1

This recipe assumes that the name of your target directory appears within clean-targets. The --full-refresh is crucial to overwrite any pre-existing objects in the green environment.

dbt clean
dbt build --target blue && \
cp target/manifest.json . && \
dbt clone --target green --state . --full-refresh

Recipe 2

Second verse, similar to the first. Only real change is using an entire folder for storing the state and using a different method to clean out that state directory in between executions.

rm -rf state-blue || true
dbt build --target blue && \
cp -r target state-blue && \
dbt clone --target green --state state-blue --full-refresh

Recipe 3

Largely cosmetic change that just switches the order of some of the commands.

dbt build --target blue && \
rm -rf state-blue || true && \
cp -rf target state-blue && \
dbt clone --target green --state state-blue --full-refresh

Recipe 4

This time we've included a helpful error message when any of the data tests don't pass (or the build fails for some other reason). Also experiments with alternative bash syntax for forcing a successful command, no matter what.

dbt build --target blue || { echo "Halting due to failed build"; false; } && \
{ rm -rf state-blue; true; } && \
cp -r target state-blue && \
dbt clone --target green --state state-blue --full-refresh

Project files

The models act as the initial "Write".

models/table_1.sql

{{ config(materialized="table") }}

select
    null as id,
    {{ dbt.string_literal(target.type) }} as target_type,
    {{ dbt.string_literal(this.schema) }} as this_schema,
    {{ dbt.string_literal(this.name) }} as this_name

models/view_1.sql

{{ config(materialized="view") }}

select
    {{ dbt.string_literal(target.type) }} as target_type,
    {{ dbt.string_literal(this.schema) }} as this_schema,
    {{ dbt.string_literal(this.name) }} as this_name

from {{ ref("table_1") }}

models/view_2.sql

{{ config(materialized="view", schema="my_custom_schema") }}

select
    {{ dbt.string_literal(target.type) }} as target_type,
    {{ dbt.string_literal(this.schema) }} as this_schema,
    {{ dbt.string_literal(this.name) }} as this_name

from {{ ref("view_1") }}

The data tests act as the "Audit".

models/_models.sql

models:
  - name: table_1
    columns:
      - name: id
        tests:
          - not_null

The "Publish" piece is purely through the dbt clone command (which uses the target definition in profiles.yml).

profiles.yml

postgres:
  target: blue

  outputs:
    blue:
      type: postgres
      dbname: 'postgres'
      host: 'localhost'
      port: 5432
      schema: dbt_blue_{{ modules.datetime.datetime.now().timestamp() | int }}

    green:
      type: postgres
      dbname: 'postgres'
      host: 'localhost'
      port: 5432
      schema: dbt_green

And generating the relevant database object names is via a custom generate_schema_name macro (and supporting macros).

macros/get_custom_schema.sql

{% macro check_if_production() -%}
    {%- set is_production = target.name == 'green' -%}
    {% do return(is_production) %}
{%- endmacro %}

{% macro generate_schema_name(custom_schema_name, node) -%}
    {%- set default_schema = target.schema -%}

    {%- if custom_schema_name is none -%}
        {{ generate_schema_name_not_custom(custom_schema_name, node, default_schema) }}
    {%- else -%}
        {{ generate_schema_name_custom(custom_schema_name, node, default_schema) }}
    {%- endif -%}

{%- endmacro %}


{% macro generate_schema_name_not_custom(custom_schema_name, node, default_schema) -%}
    {{ default_schema }}
{%- endmacro %}


{% macro generate_schema_name_custom(custom_schema_name, node, default_schema) -%}
    {%- set is_production = check_if_production() -%}

    {%- if is_production -%}
        {{ custom_schema_name | trim }}
    {%- else -%}
        {{ default_schema }}_{{ custom_schema_name | trim }}
    {%- endif -%}

{%- endmacro %}

Outstanding Problems

After each deploy to green, we've got (at least) 2x the number of database objects versus just the blue environment.
And every time we do a blue/green deployment, we've got an old schema no longer in use that will need to be cleaned up due to generating brand-new objects each deploy rather than re-using the previous objects in-place.
I didn't test out incremental types of resources like snapshots, incremental models, or materialized views.
- I'm guessing these could have trouble if we keep the non-repeatable dynamic timestamp-based blue schema in profiles.yml, but will work just fine if we use a repeatable and constant schema name like dbt_blue

History

The general pattern has probably been practiced in various forms for longer than we'll ever know. But it's been popularized in the collective consciousness under the monikers "blue/green" and "W.A.P.".

Blue/green deployments

Write-audit-publish

https://www.youtube.com/watch?v=fXHdeBnpXrg
- https://www.slideshare.net/MichelleFUfford/scaling-data-quality-netflix-76917740

dbt

Gilding the lily

Someone with the last name name Blue that was an adherent of WAP at Netflix:

https://www.mail-archive.com/dev@iceberg.apache.org/msg00738.html

The text was updated successfully, but these errors were encountered:

bbrewington · 2023-12-15T10:48:22Z

This is awesome! Look forward to referencing the post once it's written, audited, and published

Would be cool to include how to do this with both dbt Core & Cloud. Which might involve explaining all the different places target.name and env_var('DBT_TARGET') can be defined in Cloud - and if it makes sense to split this out as a separate post or docs page, glad to write it up!

Locations resolving to env_var('DBT_TARGET')

Environment Variables
Job > Advanced Settings > Environment variables > Override

Locations resolving to target.name

Profile > Credentials (for project) > Target name
profiles.yml in repo
Job > Target name

dbeatty10 · 2023-12-15T15:23:35Z

This is awesome! Look forward to referencing the post once it's written, audited, and published

See what you did there ❤️

Your cross-walk for places to define target.name and env_var('DBT_TARGET') is golden 🤩

segoldma · 2024-05-03T02:03:09Z

Curious if there is a preferred way to do this in dbt cloud:

without overriding ref.sql
without chaining two separate jobs together (given the need to build to one target and clone to another)

dbeatty10 added content Improvements or additions to content developer blog This content fits on the developer blog. labels Dec 14, 2023

graciegoheen added size: large This change will more than a week to address and might require more than one person priority: medium Fix or enhancement to existing information that’s creating some requests from customers labels Jan 2, 2024

dbeatty10 changed the title ~~Recipe for write-audit-publish / green/blue deploy~~ Recipe for write-audit-publish / blue-green deploy Feb 2, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Recipe for write-audit-publish / blue-green deploy #4648

Recipe for write-audit-publish / blue-green deploy #4648

dbeatty10 commented Dec 14, 2023 •

edited

Loading

bbrewington commented Dec 15, 2023

dbeatty10 commented Dec 15, 2023

segoldma commented May 3, 2024

Recipe for write-audit-publish / blue-green deploy #4648

Recipe for write-audit-publish / blue-green deploy #4648

Comments

dbeatty10 commented Dec 14, 2023 • edited Loading

Contact Details

I have read the dbt Developer Blog contribution guidelines.

Which of these best describes you?

What is the topic of your post?

Link to an initial outline.

Recipes

Recipe 1

Recipe 2

Recipe 3

Recipe 4

Project files

Outstanding Problems

History

Blue/green deployments

Write-audit-publish

dbt

Gilding the lily

bbrewington commented Dec 15, 2023

dbeatty10 commented Dec 15, 2023

segoldma commented May 3, 2024

dbeatty10 commented Dec 14, 2023 •

edited

Loading