Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support snapshot configs in dbt_project.yml #1613

Closed
clausherther opened this issue Jul 16, 2019 · 16 comments · Fixed by #1759
Closed

Support snapshot configs in dbt_project.yml #1613

clausherther opened this issue Jul 16, 2019 · 16 comments · Fixed by #1759
Labels
snapshots Issues related to dbt's snapshot functionality

Comments

@clausherther
Copy link
Contributor

Issue

Snapshots are transient tables by default on Snowflake

Issue description

When dbt creates the table for a snapshot model, it uses the global default of transient=true. Since the data collected via snapshot models cannot be recreated from source data, we want snapshot tables to be available for Snowflake's time travel and fail safe functionality. However, transient tables are excluded from that.

This can currently be configured per model

{% snapshot my_snapshot %}

    {{
        config(
            target_schema='snapshots',
            strategy='check',
            unique_key='id',
            check_cols=[col_1'],
            transient=false
        )
    }}
    
    select * from {{ source('my_source', 'my_table') }}
    
{% endsnapshot %}

Ideally, transient=false should be the default for snapshot models.

System information

dbt==0.14

Who will this benefit?

This will helpful for all dbt users on Snowflake

@clausherther
Copy link
Contributor Author

clausherther commented Jul 16, 2019

Existing, transient snapshots can be migrated by creating a copy and renaming the table, then dropping the old snapshot table.
E.g.

use schema snapshots;

create table my_snapshot_perm 
-- try with: copy grants
-- see: https://docs.snowflake.net/manuals/sql-reference/sql/create-table.html
as
select * from my_snapshot
;
alter table my_snapshot rename to my_snapshot_old;
alter table my_snapshot_perm rename to my_snapshot;

drop table my_snapshot_old;

Note: While in theory, you should be able to copy grants during a CTAS, this has not not been working for me, so you may also have to regrant permissions to the dbt role on this new table.

@krishbox
Copy link

Hi @clausherther, after reviewing this issue, I was oscillating between whether the default should for the transient option should be true or false. After some thinking, I would have to agree with you that the default should be false, because the default in Snowflake is false.

I would explicitly set transient to true if I know I don't want time travel or fail safe on a table, schema or database, but suppose I did want fail safe or time travel then I would not want the default behaviour of the Snowflake adapter in dbt to set transient to true.

@drewbanin
Copy link
Contributor

One approach here is to add a snapshots: config to dbt_project.yml whereby you could set something like:

snapshots:
  transient: false

This would bring snapshots in line with models, and seems like a reasonable approach to me. This would not make the default setting for transient be false, but we can document around that IMO

@clausherther
Copy link
Contributor Author

@drewbanin do you set this at the root of dbt_project.yml (since the snapshots folder is outside of models)?

E.g.

models:
    my_project:
        materialized: table

snapshots:
    transient: false

@krishbox
Copy link

Hello @drewbanin and @clausherther, that does sound reasonable, and I just reviewed the issue for introducing transient tables in the first place. It was good to get the context for the decision to make transient the default.
For reference here it is:
#946

@clausherther
Copy link
Contributor Author

I get an Invalid arguments passed to "Project" instance error at the top level and warnings that the path doesn't exist at any other level.

@clausherther
Copy link
Contributor Author

@krishbox this issue is limited to this setting for snapshots, not all tables

@beckjake
Copy link
Contributor

I think drew was saying it'd be good to add a snapshots: block - we don't support one currently.

@clausherther
Copy link
Contributor Author

Ah, makes sense, thanks!

@foundinblank
Copy link

foundinblank commented Jul 19, 2019

Pointing out here that if you have been using the older archive feature, these archive tables are not transient, and using the snapshot-migration script preserves that -- all my archive tables that I migrated to snapshots are not transient.

(caveat: I created those archive tables before Snowflake (or dbt) started supporting transient tables, though)

@johnoscott
Copy link

I actually think it would be confusing to have a new snapshots block in dbt_project.yml where this did not the default setting for transient be false,

snapshots:
  transient: false

why wouldn't it ?

@drewbanin
Copy link
Contributor

@johnoscott can you say a little bit more about that? I'm not sure I totally understand what you're saying here

@johnoscott
Copy link

Sure @drewbanin ; I commented because you said earlier (#1613 (comment)) it would NOT set the default. I don't understand why. To me, the presence of the snapshots setting transient to false would mean it is set off for all snapshots in the project (unless overridden by a specific snapshot config). Am i missing something ?

@drewbanin
Copy link
Contributor

ah! Sorry - I think I wrote that comment in the most confusing way possible:

This would bring snapshots in line with models, and seems like a reasonable approach to me. This would not make the default setting for transient be false, but we can document around that IMO

I meant that in the absence of the snapshot transience config, dbt would still create snapshots transiently (ie. the default transience config for all dbt resources would be transient: true). A user would need to explicitly set:

snapshots:
  transient: false

in order to make their Snapshot tables non-transient. Sounds like we're on the same page - sorry for the confusion!

@johnoscott
Copy link

yep, that makes sense then !

@drewbanin drewbanin added this to the Louisa May Alcott milestone Sep 5, 2019
@drewbanin drewbanin added the snapshots Issues related to dbt's snapshot functionality label Sep 11, 2019
@drewbanin drewbanin changed the title Snapshots are transient tables by default on Snowflake Support snapshot configs in dbt_project.yml Sep 11, 2019
@drewbanin
Copy link
Contributor

Feature spec:

dbt should support the specification of a snapshots: section in the dbt_project.yml file. This should implement the same semantics as models: and seeds: configs, but this config should only apply to snapshots. The immediate use case here is to set transient: false for Snapshots on Snowflake, but this config block is also useful for defining variables, snapshot strategies, check_cols, etc, for groups of snapshots at a time.

Example usage:

# dbt_project.yml

snapshots:
  transient: false
  my_project:
    source_tables:
      strategy: timestamp
      updated_at: updated_at
      target_database: raw
      target_schema: source_snapshots

NOTE: snapshot-specific configs like strategy and check_cols will not be automatically picked up in the snapshots: block. We should additionally add the following configs as "known" clobber-configs in the SourceConfig module:

  • strategy
  • unique_key
  • updated_at
  • check_cols
  • target_database
  • target_schema

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
snapshots Issues related to dbt's snapshot functionality
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants