Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Recipe for write-audit-publish / blue-green deploy #4648

Open
2 of 3 tasks
dbeatty10 opened this issue Dec 14, 2023 · 3 comments
Open
2 of 3 tasks

Recipe for write-audit-publish / blue-green deploy #4648

dbeatty10 opened this issue Dec 14, 2023 · 3 comments
Labels
content Improvements or additions to content developer blog This content fits on the developer blog. priority: medium Fix or enhancement to existing information that’s creating some requests from customers size: large This change will more than a week to address and might require more than one person

Comments

@dbeatty10
Copy link
Contributor

dbeatty10 commented Dec 14, 2023

Contact Details

dbeatty10

I have read the dbt Developer Blog contribution guidelines.

  • I have read the dbt Developer Blog contribution guidelines.

Which of these best describes you?

  • I am a dbt Community member or partner contributing to the Developer Blog
  • I work for dbt Labs and am creating this issue for a community or marketing approved piece.

What is the topic of your post?

We talked about blue/green deployments in this Developer Blog post:

But we haven't (yet!) provided some hands-on recipes for blue/green deployments that utilize dbt clone; we just linked here instead.

Link to an initial outline.

Wanna run your data tests before exposing your models to downstream consumers?

Then a blue/green deploy (also known as the write-audit-publish paradigm) is what you are looking for, and dbt clone is one way to accomplish it.

A blue/green deployment using dbt clone can build your models in a "blue" environment, run the data tests against it, and then surface the output in a "green" environment.

Recipes

See below for some (rough) recipes we can play around with which are all variants on a similar theme.

Basically, we can use a regular dbt build that targets a (stand-alone and immutable) staging environment, and then use dbt clone to update production references to point back to it.

I don't know which is "best" -- these are just a handful of different approaches that I've tried with toy projects. They would need to more fully assessed and put through their paces before applying to production workloads.

Recipe 1

This recipe assumes that the name of your target directory appears within clean-targets. The --full-refresh is crucial to overwrite any pre-existing objects in the green environment.

dbt clean
dbt build --target blue && \
cp target/manifest.json . && \
dbt clone --target green --state . --full-refresh

Recipe 2

Second verse, similar to the first. Only real change is using an entire folder for storing the state and using a different method to clean out that state directory in between executions.

rm -rf state-blue || true
dbt build --target blue && \
cp -r target state-blue && \
dbt clone --target green --state state-blue --full-refresh

Recipe 3

Largely cosmetic change that just switches the order of some of the commands.

dbt build --target blue && \
rm -rf state-blue || true && \
cp -rf target state-blue && \
dbt clone --target green --state state-blue --full-refresh

Recipe 4

This time we've included a helpful error message when any of the data tests don't pass (or the build fails for some other reason). Also experiments with alternative bash syntax for forcing a successful command, no matter what.

dbt build --target blue || { echo "Halting due to failed build"; false; } && \
{ rm -rf state-blue; true; } && \
cp -r target state-blue && \
dbt clone --target green --state state-blue --full-refresh

Project files

The models act as the initial "Write".

models/table_1.sql

{{ config(materialized="table") }}

select
    null as id,
    {{ dbt.string_literal(target.type) }} as target_type,
    {{ dbt.string_literal(this.schema) }} as this_schema,
    {{ dbt.string_literal(this.name) }} as this_name

models/view_1.sql

{{ config(materialized="view") }}

select
    {{ dbt.string_literal(target.type) }} as target_type,
    {{ dbt.string_literal(this.schema) }} as this_schema,
    {{ dbt.string_literal(this.name) }} as this_name

from {{ ref("table_1") }}

models/view_2.sql

{{ config(materialized="view", schema="my_custom_schema") }}

select
    {{ dbt.string_literal(target.type) }} as target_type,
    {{ dbt.string_literal(this.schema) }} as this_schema,
    {{ dbt.string_literal(this.name) }} as this_name

from {{ ref("view_1") }}

The data tests act as the "Audit".

models/_models.sql

models:
  - name: table_1
    columns:
      - name: id
        tests:
          - not_null

The "Publish" piece is purely through the dbt clone command (which uses the target definition in profiles.yml).

profiles.yml

postgres:
  target: blue

  outputs:
    blue:
      type: postgres
      dbname: 'postgres'
      host: 'localhost'
      port: 5432
      schema: dbt_blue_{{ modules.datetime.datetime.now().timestamp() | int }}

    green:
      type: postgres
      dbname: 'postgres'
      host: 'localhost'
      port: 5432
      schema: dbt_green

And generating the relevant database object names is via a custom generate_schema_name macro (and supporting macros).

macros/get_custom_schema.sql

{% macro check_if_production() -%}
    {%- set is_production = target.name == 'green' -%}
    {% do return(is_production) %}
{%- endmacro %}

{% macro generate_schema_name(custom_schema_name, node) -%}
    {%- set default_schema = target.schema -%}

    {%- if custom_schema_name is none -%}
        {{ generate_schema_name_not_custom(custom_schema_name, node, default_schema) }}
    {%- else -%}
        {{ generate_schema_name_custom(custom_schema_name, node, default_schema) }}
    {%- endif -%}

{%- endmacro %}


{% macro generate_schema_name_not_custom(custom_schema_name, node, default_schema) -%}
    {{ default_schema }}
{%- endmacro %}


{% macro generate_schema_name_custom(custom_schema_name, node, default_schema) -%}
    {%- set is_production = check_if_production() -%}

    {%- if is_production -%}
        {{ custom_schema_name | trim }}
    {%- else -%}
        {{ default_schema }}_{{ custom_schema_name | trim }}
    {%- endif -%}

{%- endmacro %}

Outstanding Problems

  • After each deploy to green, we've got (at least) 2x the number of database objects versus just the blue environment.
  • And every time we do a blue/green deployment, we've got an old schema no longer in use that will need to be cleaned up due to generating brand-new objects each deploy rather than re-using the previous objects in-place.
  • I didn't test out incremental types of resources like snapshots, incremental models, or materialized views.
    • I'm guessing these could have trouble if we keep the non-repeatable dynamic timestamp-based blue schema in profiles.yml, but will work just fine if we use a repeatable and constant schema name like dbt_blue

History

The general pattern has probably been practiced in various forms for longer than we'll ever know. But it's been popularized in the collective consciousness under the monikers "blue/green" and "W.A.P.".

Blue/green deployments

Write-audit-publish

dbt

Gilding the lily

Someone with the last name name Blue that was an adherent of WAP at Netflix:

@dbeatty10 dbeatty10 added content Improvements or additions to content developer blog This content fits on the developer blog. labels Dec 14, 2023
@bbrewington
Copy link
Contributor

This is awesome! Look forward to referencing the post once it's written, audited, and published

Would be cool to include how to do this with both dbt Core & Cloud. Which might involve explaining all the different places target.name and env_var('DBT_TARGET') can be defined in Cloud - and if it makes sense to split this out as a separate post or docs page, glad to write it up!

Locations resolving to env_var('DBT_TARGET')

  • Environment Variables
  • Job > Advanced Settings > Environment variables > Override

Locations resolving to target.name

  • Profile > Credentials (for project) > Target name
  • profiles.yml in repo
  • Job > Target name

@dbeatty10
Copy link
Contributor Author

This is awesome! Look forward to referencing the post once it's written, audited, and published

See what you did there ❤️

Your cross-walk for places to define target.name and env_var('DBT_TARGET') is golden 🤩

@graciegoheen graciegoheen added size: large This change will more than a week to address and might require more than one person priority: medium Fix or enhancement to existing information that’s creating some requests from customers labels Jan 2, 2024
@dbeatty10 dbeatty10 changed the title Recipe for write-audit-publish / green/blue deploy Recipe for write-audit-publish / blue-green deploy Feb 2, 2024
@segoldma
Copy link

segoldma commented May 3, 2024

Curious if there is a preferred way to do this in dbt cloud:

  • without overriding ref.sql
  • without chaining two separate jobs together (given the need to build to one target and clone to another)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
content Improvements or additions to content developer blog This content fits on the developer blog. priority: medium Fix or enhancement to existing information that’s creating some requests from customers size: large This change will more than a week to address and might require more than one person
Projects
None yet
Development

No branches or pull requests

4 participants