Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

support snapshot option 'invalidate_hard_deletes' #1

Merged
merged 2 commits into from
Jun 3, 2022

Conversation

willi-mueller
Copy link
Owner

This PR:

  1. adds the option to invalidate_hard_deletes which marks records missing in the source as deleted in the snapshot
  2. adapts snapshot_merge.sql so that it follows the same structure as the file in dbt-core

Tests

I tested snapshots manually both with and without the new configuration option invalidate_hard_deletes enabled.

{{
    config(
      unique_key='id',
      target_schema='test_schema',
      strategy='check',
      invalidate_hard_deletes=True,
      check_cols=[ 'val' ]
    )
}}

Step 0: One row (id = 1) is present in the source
Screenshot 2022-05-31 at 3 09 38 PM

Step 1: A second row (id = 2) is present in the source
Screenshot 2022-05-31 at 3 11 03 PM

Step 2: The first row (id = 1) is deleted in the source. Thus, valid_to is set by the dbt snapshot run
Screenshot 2022-05-31 at 5 38 56 PM

Step 3: The first row (id = 1) appears again in the source. Thus, it is added to the snapshot as a valid row
Screenshot 2022-05-31 at 5 39 56 PM

Step 4: The second row (id = 2) changes its value to val = 'v2'. Thus, the old record is is marked as valid_until und a new row for the current value is appended to the snapshot.
Screenshot 2022-05-31 at 5 40 54 PM

Thus it has been demonstrated that this PR satisfies the following cases:

  1. The existing functionality of tracking changes of rows over time has not been changed
  2. Hard deletes in the source are tracked as well if the option is enabled. This means that if a row does not appear in the source data, it will be marked as deleted in the snapshot
  3. Once deleted, rows can re-appear in the source and their re-appearance will be tracked in the snapshot as well.

Background

This PR aims to follow the code from dbt-core as seen here and here

@willi-mueller willi-mueller merged commit 3bf0dbc into master Jun 3, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant