Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Handle case when an incremental table is empty #5326

Merged
merged 4 commits into from
Apr 24, 2024
Merged

Conversation

dbeatty10
Copy link
Contributor

@dbeatty10 dbeatty10 commented Apr 23, 2024

Preview

What are you changing in this pull request and why?

resolves #5321

To ensure that the updated code will work for a broad number of users without issues, I tested the following example against these data platforms:

  • bigquery
  • databricks
  • duckdb
  • postgres
  • redshift
  • snowflake
image

☝️ Notice the table is empty, like the edge case scenario described in dbt-labs/dbt-core#9997

image

☝️ Notice it successfully added new data when it arrived.

Reprex

Create this file:

models/my_incremental.sql

{{ config(materialized="incremental") }}

with

non_empty_cte as (

    select 1 as id, cast('2024-01-01' as date) as event_time

),

empty_cte as (

    select 0 as id, cast('1999-12-31' as date) as event_time
    from non_empty_cte
    where 0=1

)

select *

{% if var("scenario", "empty") == "empty" %}

  from empty_cte

{% else %}

  from non_empty_cte

{% endif %}

{% if is_incremental() %}

  -- this filter will only be applied on an incremental run
  -- (uses >= to include records whose timestamp occurred since the last run of this model)
  where event_time >= (select coalesce(max(event_time), cast('1900-01-01' as date)) from {{ this }})

{% endif %}

Assuming a profiles.yml with all the relevant profile names, run these commands:

dbt run  --profile duckdb -s my_incremental --vars '{scenario: empty}' --full-refresh
dbt show --profile duckdb --inline "select * from {{ ref('my_incremental') }}"
dbt run  --profile duckdb -s my_incremental --vars '{scenario: empty}'
dbt show --profile duckdb --inline "select * from {{ ref('my_incremental') }}"
dbt run  --profile duckdb -s my_incremental --vars '{scenario: non_empty}'
dbt show --profile duckdb --inline "select * from {{ ref('my_incremental') }}"
dbt run  --profile postgres -s my_incremental --vars '{scenario: empty}' --full-refresh
dbt show --profile postgres --inline "select * from {{ ref('my_incremental') }}"
dbt run  --profile postgres -s my_incremental --vars '{scenario: empty}'
dbt show --profile postgres --inline "select * from {{ ref('my_incremental') }}"
dbt run  --profile postgres -s my_incremental --vars '{scenario: non_empty}'
dbt show --profile postgres --inline "select * from {{ ref('my_incremental') }}"
dbt run  --profile redshift -s my_incremental --vars '{scenario: empty}' --full-refresh
dbt show --profile redshift --inline "select * from {{ ref('my_incremental') }}"
dbt run  --profile redshift -s my_incremental --vars '{scenario: empty}'
dbt show --profile redshift --inline "select * from {{ ref('my_incremental') }}"
dbt run  --profile redshift -s my_incremental --vars '{scenario: non_empty}'
dbt show --profile redshift --inline "select * from {{ ref('my_incremental') }}"
dbt run  --profile databricks -s my_incremental --vars '{scenario: empty}' --full-refresh
dbt show --profile databricks --inline "select * from {{ ref('my_incremental') }}"
dbt run  --profile databricks -s my_incremental --vars '{scenario: empty}'
dbt show --profile databricks --inline "select * from {{ ref('my_incremental') }}"
dbt run  --profile databricks -s my_incremental --vars '{scenario: non_empty}'
dbt show --profile databricks --inline "select * from {{ ref('my_incremental') }}"
dbt run  --profile snowflake -s my_incremental --vars '{scenario: empty}' --full-refresh
dbt show --profile snowflake --inline "select * from {{ ref('my_incremental') }}"
dbt run  --profile snowflake -s my_incremental --vars '{scenario: empty}'
dbt show --profile snowflake --inline "select * from {{ ref('my_incremental') }}"
dbt run  --profile snowflake -s my_incremental --vars '{scenario: non_empty}'
dbt show --profile snowflake --inline "select * from {{ ref('my_incremental') }}"
dbt run  --profile bigquery -s my_incremental --vars '{scenario: empty}' --full-refresh
dbt show --profile bigquery --inline "select * from {{ ref('my_incremental') }}"
dbt run  --profile bigquery -s my_incremental --vars '{scenario: empty}'
dbt show --profile bigquery --inline "select * from {{ ref('my_incremental') }}"
dbt run  --profile bigquery -s my_incremental --vars '{scenario: non_empty}'
dbt show --profile bigquery --inline "select * from {{ ref('my_incremental') }}"

Checklist

Copy link

vercel bot commented Apr 23, 2024

The latest updates on your projects. Learn more about Vercel for Git ↗︎

Name Status Preview Comments Updated (UTC)
docs-getdbt-com ✅ Ready (Inspect) Visit Preview 💬 Add feedback Apr 24, 2024 6:54pm

@github-actions github-actions bot added content Improvements or additions to content size: x-small This change will take under 3 hours to fix. labels Apr 23, 2024
@dbeatty10 dbeatty10 marked this pull request as ready for review April 23, 2024 13:14
@dbeatty10 dbeatty10 requested a review from a team as a code owner April 23, 2024 13:14
@dbeatty10 dbeatty10 merged commit e5d71be into current Apr 24, 2024
11 checks passed
@dbeatty10 dbeatty10 deleted the dbeatty10-patch-4 branch April 24, 2024 18:54
mirnawong1 pushed a commit that referenced this pull request Jul 18, 2024
[Preview](https://docs-getdbt-com-git-dbeatty10-patch-1-dbt-labs.vercel.app/docs/build/incremental-models#filtering-rows-on-an-incremental-run)

## What are you changing in this pull request and why?

The code change in #5326
was tested against a variety of dbt adapters. The update in
#5306 modified that code
example, and it appears to have database-specific cast syntax (`::`) as
well as a data type that means different things in different databases
(i.e., the `TIMESTAMP` data type has different semantics in BigQuery vs.
Snowflake).

So we should restore the code example in
#5326. If there is a
better example that has been tested across dbt adapters, then we can
consider making another update in the future.

## Checklist
- [x] Review the [Content style
guide](https://github.com/dbt-labs/docs.getdbt.com/blob/current/contributing/content-style-guide.md)
so my content adheres to these guidelines.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
content Improvements or additions to content size: x-small This change will take under 3 hours to fix.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Core] More robust example for incremental runs
2 participants