From 941d01d95f87f61b97749d860aec1118a4893bf3 Mon Sep 17 00:00:00 2001 From: "Leona B. Campbell" <3880403+runleonarun@users.noreply.github.com> Date: Mon, 5 Dec 2022 16:02:11 -0800 Subject: [PATCH 01/54] Update bad link --- website/docs/docs/collaborate/git/version-control-basics.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/website/docs/docs/collaborate/git/version-control-basics.md b/website/docs/docs/collaborate/git/version-control-basics.md index dc304d99ca6..b2aa6a1875a 100644 --- a/website/docs/docs/collaborate/git/version-control-basics.md +++ b/website/docs/docs/collaborate/git/version-control-basics.md @@ -53,7 +53,7 @@ You can perform git tasks with the git button in the Cloud IDE. The following ar Merge conflicts often occur when multiple users are concurrently making edits to the same section in the same file. This makes it difficult for Git to determine which change should be kept. -Refer to [resolve merge conflicts](/docs/collaborate/git/resolve-merge-conflicts) to learn how to resolve merge conflicts. +Refer to [resolve merge conflicts](/docs/collaborate/git/merge-conflicts) to learn how to resolve merge conflicts. ## The .gitignore file From 071717de6c77bf70ca6512a2d504f17018827630 Mon Sep 17 00:00:00 2001 From: mirnawong1 Date: Wed, 27 Nov 2024 16:04:01 +0000 Subject: [PATCH 02/54] add hard deletes --- website/docs/docs/build/snapshots.md | 99 ++++++++++----- .../core-upgrade/06-upgrading-to-v1.9.md | 1 + .../docs/docs/dbt-versions/release-notes.md | 4 + .../resource-configs/hard-deletes.md | 113 ++++++++++++++++++ .../invalidate_hard_deletes.md | 4 + .../snapshot_meta_column_names.md | 15 ++- website/docs/reference/snapshot-configs.md | 3 +- website/sidebars.js | 9 +- website/snippets/_hard-deletes.md | 13 ++ 9 files changed, 224 insertions(+), 37 deletions(-) create mode 100644 website/docs/reference/resource-configs/hard-deletes.md create mode 100644 website/snippets/_hard-deletes.md diff --git a/website/docs/docs/build/snapshots.md b/website/docs/docs/build/snapshots.md index 3b21549a3c7..286f7450b4c 100644 --- a/website/docs/docs/build/snapshots.md +++ b/website/docs/docs/build/snapshots.md @@ -10,8 +10,7 @@ id: "snapshots" * [Snapshot properties](/reference/snapshot-properties) * [`snapshot` command](/reference/commands/snapshot) - -### What are snapshots? +## What are snapshots? Analysts often need to "look back in time" at previous data states in their mutable tables. While some source data systems are built in a way that makes accessing historical data possible, this is not always the case. dbt provides a mechanism, **snapshots**, which records changes to a mutable over time. Snapshots implement [type-2 Slowly Changing Dimensions](https://en.wikipedia.org/wiki/Slowly_changing_dimension#Type_2:_add_new_row) over mutable source tables. These Slowly Changing Dimensions (or SCDs) identify how a row in a table changes over time. Imagine you have an `orders` table where the `status` field can be overwritten as the order is processed. @@ -66,6 +65,7 @@ snapshots: [invalidate_hard_deletes](/reference/resource-configs/invalidate_hard_deletes): true | false [snapshot_meta_column_names](/reference/resource-configs/snapshot_meta_column_names): dictionary [dbt_valid_to_current](/reference/resource-configs/dbt_valid_to_current): string + [hard_deletes](/reference/resource-configs/hard-deletes): string ``` @@ -84,6 +84,7 @@ The following table outlines the configurations available for snapshots: | [invalidate_hard_deletes](/reference/resource-configs/invalidate_hard_deletes) | Find hard deleted records in source and set `dbt_valid_to` to current time if the record no longer exists | No | True | | [dbt_valid_to_current](/reference/resource-configs/dbt_valid_to_current) | Set a custom indicator for the value of `dbt_valid_to` in current snapshot records (like a future date). By default, this value is `NULL`. When configured, dbt will use the specified value instead of `NULL` for `dbt_valid_to` for current records in the snapshot table.| No | string | | [snapshot_meta_column_names](/reference/resource-configs/snapshot_meta_column_names) | Customize the names of the snapshot meta fields | No | dictionary | +| [hard_deletes](/reference/resource-configs/hard-deletes) | Track hard deletes by adding a new record when row become "deleted" in source | No | string | - In versions prior to v1.9, the `target_schema` (required) and `target_database` (optional) configurations defined a single schema or database to build a snapshot across users and environment. This created problems when testing or developing a snapshot, as there was no clear separation between development and production environments. In v1.9, `target_schema` became optional, allowing snapshots to be environment-aware. By default, without `target_schema` or `target_database` defined, snapshots now use the `generate_schema_name` or `generate_database_name` macros to determine where to build. Developers can still set a custom location with [`schema`](/reference/resource-configs/schema) and [`database`](/reference/resource-configs/database) configs, consistent with other resource types. @@ -215,10 +216,14 @@ When you run the [`dbt snapshot` command](/reference/commands/snapshot): - The `dbt_valid_to` column will be updated for any existing records that have changed. - The updated record and any new records will be inserted into the snapshot table. These records will now have `dbt_valid_to = null` or the value configured in `dbt_valid_to_current` (available in Versionless and 1.9 and higher). + + #### Note - These column names can be customized to your team or organizational conventions using the [snapshot_meta_column_names](#snapshot-meta-fields) config. - Use the `dbt_valid_to_current` config to set a custom indicator for the value of `dbt_valid_to` in current snapshot records (like a future date such as `9999-12-31`). By default, this value is `NULL`. When set, dbt will use this specified value instead of `NULL` for `dbt_valid_to` for current records in the snapshot table. - +- Use the [`hard_deletes`](/reference/resource-configs/hard-deletes) config to track hard deletes by adding a new record when row become "deleted" in source. Supported options are `ignore`, `invalidate`, and `new_record`. + + Snapshots can be referenced in downstream models the same way as referencing models — by using the [ref](/reference/dbt-jinja-functions/ref) function. ## Detecting row changes @@ -294,7 +299,7 @@ The `check` snapshot strategy can be configured to track changes to _all_ column ::: -**Example Usage** +**Example usage** @@ -344,15 +349,64 @@ snapshots: ### Hard deletes (opt-in) + + +In dbt v1.9 and higher, the [`hard_deletes`](/reference/resource-configs/hard-deletes) config replaces the `invalidate_hard_deletes` config to give you more control on how to handle deleted rows from the source. The `hard_deletes` config is not a separate strategy but an additional opt-in feature that can be used with any snapshot strategy. + +The `hard_deletes` config has three options/fields: +| Field | Description | +| --------- | ----------- | +| `ignore` (default) | No action for deleted records. | +| `invalidate` | Behaves the same as the existing `invalidate_hard_deletes=true`, where deleted records are invalidated by setting `dbt_valid_to`. | +| `new_record` | Tracks deleted records as new rows using the `dbt_is_deleted` [meta field](#snapshot-meta-fields) when records are deleted.| + +import HardDeletes from '/snippets/_hard-deletes.md'; + + + +#### Example usage + + + +```yaml +snapshots: + - name: orders_snapshot_hard_delete + relation: source('jaffle_shop', 'orders') + config: + schema: snapshots + unique_key: id + strategy: timestamp + updated_at: updated_at + hard_deletes: new_record # options are: 'ignore', 'invalidate', or 'new_record' +``` + + + +In this example, the `hard_deletes: new_record` config will add a new row for deleted records woth the `dbt_is_deleted` column set to `True`. +Any restored records are added as new rows with the `dbt_is_deleted` field set to `False`. + +The resulting table will look like this: + +| id | status | updated_at | dbt_valid_from | dbt_valid_to | dbt_is_deleted | +| -- | ------ | ---------- | -------------- | ------------ | -------------- | +| 1 | pending | 2024-01-01 10:47 | 2024-01-01 10:47 | 2024-01-01 11:05 | False | +| 1 | shipped | 2024-01-01 11:05 | 2024-01-01 11:05 | 2024-01-01 11:20 | False | +| 1 | deleted | 2024-01-01 11:20 | 2024-01-01 11:20 | | True | +| 1 | restored | 2024-01-01 12:00 | 2024-01-01 12:00 | | False | + + + + + Rows that are deleted from the source query are not invalidated by default. With the config option `invalidate_hard_deletes`, dbt can track rows that no longer exist. This is done by left joining the snapshot table with the source table, and filtering the rows that are still valid at that point, but no longer can be found in the source table. `dbt_valid_to` will be set to the current snapshot time. This configuration is not a different strategy as described above, but is an additional opt-in feature. It is not enabled by default since it alters the previous behavior. For this configuration to work with the `timestamp` strategy, the configured `updated_at` column must be of timestamp type. Otherwise, queries will fail due to mixing data types. -**Example Usage** +Note, in v1.9 and higher, the [`hard_deletes`](/reference/resource-configs/hard-deletes) config replaces the `invalidate_hard_deletes` config for better control over how to handle deleted rows from the source. - +#### Example usage @@ -378,26 +432,6 @@ For this configuration to work with the `timestamp` strategy, the configured `up - - - - -```yaml -snapshots: - - name: orders_snapshot_hard_delete - relation: source('jaffle_shop', 'orders') - config: - schema: snapshots - unique_key: id - strategy: timestamp - updated_at: updated_at - invalidate_hard_deletes: true -``` - - - - - ## Snapshot meta-fields Snapshot tables will be created as a clone of your source dataset, plus some additional meta-fields*. @@ -405,13 +439,15 @@ Snapshot tables will be created as a clone of your sourc Starting in 1.9 or with [dbt Cloud Versionless](/docs/dbt-versions/upgrade-dbt-version-in-cloud#versionless): - These column names can be customized to your team or organizational conventions using the [`snapshot_meta_column_names`](/reference/resource-configs/snapshot_meta_column_names) config. - Use the [`dbt_valid_to_current` config](/reference/resource-configs/dbt_valid_to_current) to set a custom indicator for the value of `dbt_valid_to` in current snapshot records (like a future date such as `9999-12-31`). By default, this value is `NULL`. When set, dbt will use this specified value instead of `NULL` for `dbt_valid_to` for current records in the snapshot table. +- Use the [`hard_deletes`](/reference/resource-configs/hard-deletes) config to track deleted records as new rows with the `dbt_is_deleted` meta field when using the `hard_deletes='new_record'` field. | Field | Meaning | Usage | -| -------------- | ------- | ----- | +| -------------- | ------- | ----- | | dbt_valid_from | The timestamp when this snapshot row was first inserted | This column can be used to order the different "versions" of a record. | | dbt_valid_to | The timestamp when this row became invalidated.
For current records, this is `NULL` by default or the value specified in `dbt_valid_to_current`. | The most recent snapshot record will have `dbt_valid_to` set to `NULL` or the specified value. | | dbt_scd_id | A unique key generated for each snapshotted record. | This is used internally by dbt | | dbt_updated_at | The updated_at timestamp of the source record when this snapshot row was inserted. | This is used internally by dbt | +| dbt_is_deleted | A boolean value indicating if the record has been deleted. `True` if deleted, `False` otherwise. | Added when `hard_deletes='new_record'` is configured. This is used internally by dbt | *The timestamps used for each column are subtly different depending on the strategy you use: @@ -445,6 +481,15 @@ Snapshot results (note that `11:30` is not used anywhere): | 1 | pending | 2024-01-01 10:47 | 2024-01-01 10:47 | 2024-01-01 11:05 | 2024-01-01 10:47 | | 1 | shipped | 2024-01-01 11:05 | 2024-01-01 11:05 | | 2024-01-01 11:05 | +Snapshot results with `hard_deletes='new_record'`: + +| id | status | updated_at | dbt_valid_from | dbt_valid_to | dbt_updated_at | dbt_is_deleted | +|----|---------|------------------|------------------|------------------|------------------|----------------| +| 1 | pending | 2024-01-01 10:47 | 2024-01-01 10:47 | 2024-01-01 11:05 | 2024-01-01 10:47 | False | +| 1 | shipped | 2024-01-01 11:05 | 2024-01-01 11:05 | 2024-01-01 11:20 | 2024-01-01 11:05 | False | +| 1 | deleted | 2024-01-01 11:20 | 2024-01-01 11:20 | | 2024-01-01 11:20 | True | + +
diff --git a/website/docs/docs/dbt-versions/core-upgrade/06-upgrading-to-v1.9.md b/website/docs/docs/dbt-versions/core-upgrade/06-upgrading-to-v1.9.md index 8b809877870..c1a2c88eda7 100644 --- a/website/docs/docs/dbt-versions/core-upgrade/06-upgrading-to-v1.9.md +++ b/website/docs/docs/dbt-versions/core-upgrade/06-upgrading-to-v1.9.md @@ -65,6 +65,7 @@ Beginning in dbt Core 1.9, we've streamlined snapshot configuration and added a - Standard `schema` and `database` configs supported: Snapshots will now be consistent with other dbt resource types. You can specify where environment-aware snapshots should be stored. - Warning for incorrect `updated_at` data type: To ensure data integrity, you'll see a warning if the `updated_at` field specified in the snapshot configuration is not the proper data type or timestamp. - Set a custom current indicator for the value of `dbt_valid_to`: Use the [`dbt_valid_to_current` config](/reference/resource-configs/dbt_valid_to_current) to set a custom indicator for the value of `dbt_valid_to` in current snapshot records (like a future date). By default, this value is `NULL`. When configured, dbt will use the specified value instead of `NULL` for `dbt_valid_to` for current records in the snapshot table. +- Use the [`hard_deletes`](/reference/resource-configs/hard-deletes) configuration to track hard deletes by adding a new record when row become "deleted" in source. This config replaces the `invalidate_hard_deletes` to give you more control on how to handle deleted rows from the source. Supported fields are `ignore`, `invalidate`, and `new_record`. Read more about [Snapshots meta fields](/docs/build/snapshots#snapshot-meta-fields). diff --git a/website/docs/docs/dbt-versions/release-notes.md b/website/docs/docs/dbt-versions/release-notes.md index 55116db68ba..91e6ec0e8cb 100644 --- a/website/docs/docs/dbt-versions/release-notes.md +++ b/website/docs/docs/dbt-versions/release-notes.md @@ -18,6 +18,10 @@ Release notes are grouped by month for both multi-tenant and virtual private clo \* The official release date for this new format of release notes is May 15th, 2024. Historical release notes for prior dates may not reflect all available features released earlier this year or their tenancy availability. +## December 2024 + +- **New**: The [`hard_deletes`](/reference/resource-configs/hard-deletes) config replaces the `invalidate_hard_deletes` config to give you more control on how to handle deleted rows from the source. Supported options are `ignore`, `invalidate`, and `new_record`. + ## November 2024 - **Fix**: Job environment variable overrides in credentials are now respected for Exports. Previously, they were ignored. - **Behavior change**: If you use a custom microbatch macro, set a [`require_batched_execution_for_custom_microbatch_strategy` behavior flag](/reference/global-configs/behavior-changes#custom-microbatch-strategy) in your `dbt_project.yml` to enable batched execution. If you don't have a custom microbatch macro, you don't need to set this flag as dbt will handle microbatching automatically for any model using the [microbatch strategy](/docs/build/incremental-microbatch#how-microbatch-compares-to-other-incremental-strategies). diff --git a/website/docs/reference/resource-configs/hard-deletes.md b/website/docs/reference/resource-configs/hard-deletes.md new file mode 100644 index 00000000000..812be4598c1 --- /dev/null +++ b/website/docs/reference/resource-configs/hard-deletes.md @@ -0,0 +1,113 @@ +--- +title: hard_deletes +resource_types: [snapshots] +description: "Use the `hard_deletes` config to control how deleted rows are tracked in your snapshot table." +datatype: "{}" +default_value: {ignore} +id: "hard-deletes" +sidebar_label: "hard_deletes" +--- + +Available from dbt v1.9 or with [Versionless](/docs/dbt-versions/upgrade-dbt-version-in-cloud#versionless) dbt Cloud. + + + + +```yaml +snapshots: + - name: + config: + hard_deletes: 'ignore', 'invalidate', or 'new_record' +``` + + + + +```yml +snapshots: + [](/reference/resource-configs/resource-path): + +hard_deletes: "ignore", "invalidate", or "new_record" +``` + + + + + +```sql +{{ + config( + unique_key='id', + strategy='timestamp', + updated_at='updated_at', + hard_deletes='ignore', 'invalidate', 'new_record' + ) +}} +``` + + + + +## Description + +Use the `hard_deletes` configuration to track hard deletes by adding a new record when row become "deleted" in source. +Replaces the `invalidate_hard_deletes` config to give you more control on how to handle deleted rows from the source. + +import HardDeletes from '/snippets/_hard-deletes.md'; + + + +:::warning + +If you're updating an existing snapshot to use the `hard_deletes` config, dbt _will not_ handle migrations automatically. We recommend either only using these settings for net-new snapshots, or arranging an update of pre-existing tables before enabling this setting. +::: + +## Default + +By default, if you don’t specify `hard_deletes`, it'll automatically default to `ignore`. Deleted rows will not be tracked and their `dbt_valid_to` column remains `NULL`. + +The `hard_deletes` config has three options: + +| Field | Description | +| --------- | ----------- | +| `ignore` (default) | No action for deleted records. | +| `invalidate` | Behaves the same as the existing `invalidate_hard_deletes=true`, where deleted records are invalidated by setting `dbt_valid_to`. | +| `new_record` | Tracks deleted records as new rows using the `dbt_is_deleted` meta field when records are deleted.| + +## Impact on snapshot records + +- **Backward compatibility**: The `invalidate_hard_deletes` config is still supported for existing snapshots but can't be used alongside `hard_deletes`. +- **New snapshots**: For new snapshots, we recommend using `hard_deletes` instead of `invalidate_hard_deletes`. +- **Migration**: If you switch an existing snapshot to use `hard_deletes` without migrating your data, you may encounter inconsistent or incorrect results, such as a mix of old and new data formats. + +## Example + + + +```yaml +snapshots: + - name: my_snapshot + config: + hard_deletes: new_record # options are: 'ignore', 'invalidate', or 'new_record' + strategy: timestamp + updated_at: updated_at + columns: + - name: dbt_valid_from + description: Timestamp when the record became valid. + - name: dbt_valid_to + description: Timestamp when the record stopped being valid. + - name: dbt_is_deleted + description: Indicates whether the record was deleted. +``` + + + +The resulting snapshot table contains the `hard_deletes: new_record` configuration. If a record is deleted and later restored, the resulting snapshot table might look like this: + +| id | dbt_scd_id | Status | dbt_updated_at | dbt_valid_from | dbt_valid_to | dbt_is_deleted | +| -- | -------------------- | ----- | -------------------- | --------------------| -------------------- | ----------- | +| 1 | 60a1f1dbdf899a4dd... | pending | 2024-10-02 ... | 2024-05-19... | 2024-05-20 ... | False | +| 1 | b1885d098f8bcff51... | cancelled| 2024-10-02 ... | 2024-05-20 ... | 2024-06-03 ... | True | +| 1 | b1885d098f8bcff53... | shipped | 2024-10-02 ... | 2024-06-03 ... | | False | +| 2 | b1885d098f8bcff55... | active | 2024-10-02 ... | 2024-05-19 ... | | False | + +In this example, the `dbt_is_deleted` column is set to `True` when the record is deleted. When the record is restored, the `dbt_is_deleted` column is set to `False`. diff --git a/website/docs/reference/resource-configs/invalidate_hard_deletes.md b/website/docs/reference/resource-configs/invalidate_hard_deletes.md index bdaec7e33a9..c3ca5bbe0d2 100644 --- a/website/docs/reference/resource-configs/invalidate_hard_deletes.md +++ b/website/docs/reference/resource-configs/invalidate_hard_deletes.md @@ -4,6 +4,10 @@ description: "Invalidate_hard_deletes - Read this in-depth guide to learn about datatype: column_name --- +:::tip Use the hard_deletes config instead + +Note, in Versionless and dbt Core 1.9 and higher, the [`hard_deletes`](/reference/resource-configs/hard-deletes) config replaces the `invalidate_hard_deletes` config for better control over how to handle deleted rows from the source. +::: diff --git a/website/docs/reference/resource-configs/snapshot_meta_column_names.md b/website/docs/reference/resource-configs/snapshot_meta_column_names.md index 46aba7886d0..47e3f711624 100644 --- a/website/docs/reference/resource-configs/snapshot_meta_column_names.md +++ b/website/docs/reference/resource-configs/snapshot_meta_column_names.md @@ -19,6 +19,7 @@ snapshots: dbt_valid_to: dbt_scd_id: dbt_updated_at: + dbt_is_deleted: ``` @@ -34,6 +35,7 @@ snapshots: "dbt_valid_to": "", "dbt_scd_id": "", "dbt_updated_at": "", + "dbt_is_deleted": "", } ) }} @@ -52,7 +54,7 @@ snapshots: dbt_valid_to: dbt_scd_id: dbt_updated_at: - + dbt_is_deleted: ``` @@ -71,6 +73,7 @@ By default, dbt snapshots use the following column names to track change history | `dbt_valid_to` | The timestamp when this row is no longer valid. | | | `dbt_scd_id` | A unique key generated for each snapshot row. | This is used internally by dbt. | | `dbt_updated_at` | The `updated_at` timestamp of the source record when this snapshot row was inserted. | This is used internally by dbt. | +| `dbt_is_deleted` | A boolean value indicating if the record has been deleted. `True` if deleted, `False` otherwise. | Added when `hard_deletes='new_record'` is configured. This is used internally by dbt | However, these column names can be customized using the `snapshot_meta_column_names` config. @@ -92,18 +95,20 @@ snapshots: unique_key: id strategy: check check_cols: all + hard_deletes: new_record snapshot_meta_column_names: dbt_valid_from: start_date dbt_valid_to: end_date dbt_scd_id: scd_id dbt_updated_at: modified_date + dbt_is_deleted: is_deleted ``` The resulting snapshot table contains the configured meta column names: -| id | scd_id | modified_date | start_date | end_date | -| -- | -------------------- | -------------------- | -------------------- | -------------------- | -| 1 | 60a1f1dbdf899a4dd... | 2024-10-02 ... | 2024-10-02 ... | 2024-10-02 ... | -| 2 | b1885d098f8bcff51... | 2024-10-02 ... | 2024-10-02 ... | | +| id | scd_id | modified_date | start_date | end_date | is_deleted | +| -- | -------------------- | -------------------- | -------------------- | -------------------- | ---------- | +| 1 | 60a1f1dbdf899a4dd... | 2024-10-02 ... | 2024-10-02 ... | 2024-10-02 ... | False | +| 2 | b1885d098f8bcff51... | 2024-10-02 ... | 2024-10-02 ... | | False | diff --git a/website/docs/reference/snapshot-configs.md b/website/docs/reference/snapshot-configs.md index 7b3c0f8e5b1..361ca3871ed 100644 --- a/website/docs/reference/snapshot-configs.md +++ b/website/docs/reference/snapshot-configs.md @@ -80,6 +80,7 @@ snapshots: [+](/reference/resource-configs/plus-prefix)[check_cols](/reference/resource-configs/check_cols): [] | all [+](/reference/resource-configs/plus-prefix)[snapshot_meta_column_names](/reference/resource-configs/snapshot_meta_column_names): {} [+](/reference/resource-configs/plus-prefix)[invalidate_hard_deletes](/reference/resource-configs/invalidate_hard_deletes) : true | false + [+](/reference/resource-configs/plus-prefix)[hard_deletes](/reference/resource-configs/hard-deletes): string ``` @@ -114,6 +115,7 @@ snapshots: [check_cols](/reference/resource-configs/check_cols): [] | all [snapshot_meta_column_names](/reference/resource-configs/snapshot_meta_column_names): {} [invalidate_hard_deletes](/reference/resource-configs/invalidate_hard_deletes) : true | false + [hard_deletes](/reference/resource-configs/hard-deletes): string ``` @@ -150,7 +152,6 @@ Configurations can be applied to snapshots using the [YAML syntax](/docs/build/s - ### General configurations diff --git a/website/sidebars.js b/website/sidebars.js index 04afb7c0c99..2796d4e7797 100644 --- a/website/sidebars.js +++ b/website/sidebars.js @@ -969,17 +969,18 @@ const sidebarSettings = { label: "For snapshots", items: [ "reference/snapshot-properties", - "reference/resource-configs/snapshot_name", "reference/snapshot-configs", "reference/resource-configs/check_cols", + "reference/resource-configs/dbt_valid_to_current", + "reference/resource-configs/hard-deletes", + "reference/resource-configs/invalidate_hard_deletes", + "reference/resource-configs/snapshot_meta_column_names", + "reference/resource-configs/snapshot_name", "reference/resource-configs/strategy", "reference/resource-configs/target_database", "reference/resource-configs/target_schema", "reference/resource-configs/unique_key", "reference/resource-configs/updated_at", - "reference/resource-configs/invalidate_hard_deletes", - "reference/resource-configs/snapshot_meta_column_names", - "reference/resource-configs/dbt_valid_to_current", ], }, { diff --git a/website/snippets/_hard-deletes.md b/website/snippets/_hard-deletes.md new file mode 100644 index 00000000000..b03a9ed47b8 --- /dev/null +++ b/website/snippets/_hard-deletes.md @@ -0,0 +1,13 @@ + + +**Use `invalidate_hard_deletes` (v1.8 and earlier) if:** +- You want to invalidate deleted rows by setting their `dbt_valid_to` timestamp to the snapshot time (implicit delete). +- You are working with smaller datasets where tracking deletions as a separate state is unnecessary. +- Gaps in the snapshot history (missing records for deleted rows) are acceptable. + +**Use `hard_deletes: new_record` (v1.9 and higher) if:** +- You want to explicitly track deletions by adding new rows with a `dbt_is_deleted` column (explicit delete). +- You want to maintain continuous snapshot history without gaps. +- You are working with larger datasets where explicitly tracking deleted records improves data lineage clarity. + + From 0f86a11887c7bc3e92aebbc926eb3ae4fb52fb7f Mon Sep 17 00:00:00 2001 From: mirnawong1 Date: Wed, 27 Nov 2024 16:38:25 +0000 Subject: [PATCH 03/54] remove old --- website/docs/docs/build/snapshots.md | 2 -- website/docs/reference/snapshot-configs.md | 2 -- 2 files changed, 4 deletions(-) diff --git a/website/docs/docs/build/snapshots.md b/website/docs/docs/build/snapshots.md index 286f7450b4c..056ef28a17f 100644 --- a/website/docs/docs/build/snapshots.md +++ b/website/docs/docs/build/snapshots.md @@ -62,7 +62,6 @@ snapshots: [unique_key](/reference/resource-configs/unique_key): column_name_or_expression [check_cols](/reference/resource-configs/check_cols): [column_name] | all [updated_at](/reference/resource-configs/updated_at): column_name - [invalidate_hard_deletes](/reference/resource-configs/invalidate_hard_deletes): true | false [snapshot_meta_column_names](/reference/resource-configs/snapshot_meta_column_names): dictionary [dbt_valid_to_current](/reference/resource-configs/dbt_valid_to_current): string [hard_deletes](/reference/resource-configs/hard-deletes): string @@ -81,7 +80,6 @@ The following table outlines the configurations available for snapshots: | [unique_key](/reference/resource-configs/unique_key) | A column(s) (string or array) or expression for the record | Yes | `id` or `[order_id, product_id]` | | [check_cols](/reference/resource-configs/check_cols) | If using the `check` strategy, then the columns to check | Only if using the `check` strategy | ["status"] | | [updated_at](/reference/resource-configs/updated_at) | If using the `timestamp` strategy, the timestamp column to compare | Only if using the `timestamp` strategy | updated_at | -| [invalidate_hard_deletes](/reference/resource-configs/invalidate_hard_deletes) | Find hard deleted records in source and set `dbt_valid_to` to current time if the record no longer exists | No | True | | [dbt_valid_to_current](/reference/resource-configs/dbt_valid_to_current) | Set a custom indicator for the value of `dbt_valid_to` in current snapshot records (like a future date). By default, this value is `NULL`. When configured, dbt will use the specified value instead of `NULL` for `dbt_valid_to` for current records in the snapshot table.| No | string | | [snapshot_meta_column_names](/reference/resource-configs/snapshot_meta_column_names) | Customize the names of the snapshot meta fields | No | dictionary | | [hard_deletes](/reference/resource-configs/hard-deletes) | Track hard deletes by adding a new record when row become "deleted" in source | No | string | diff --git a/website/docs/reference/snapshot-configs.md b/website/docs/reference/snapshot-configs.md index 361ca3871ed..b40bbe48554 100644 --- a/website/docs/reference/snapshot-configs.md +++ b/website/docs/reference/snapshot-configs.md @@ -79,7 +79,6 @@ snapshots: [+](/reference/resource-configs/plus-prefix)[updated_at](/reference/resource-configs/updated_at): [+](/reference/resource-configs/plus-prefix)[check_cols](/reference/resource-configs/check_cols): [] | all [+](/reference/resource-configs/plus-prefix)[snapshot_meta_column_names](/reference/resource-configs/snapshot_meta_column_names): {} - [+](/reference/resource-configs/plus-prefix)[invalidate_hard_deletes](/reference/resource-configs/invalidate_hard_deletes) : true | false [+](/reference/resource-configs/plus-prefix)[hard_deletes](/reference/resource-configs/hard-deletes): string ``` @@ -114,7 +113,6 @@ snapshots: [updated_at](/reference/resource-configs/updated_at): [check_cols](/reference/resource-configs/check_cols): [] | all [snapshot_meta_column_names](/reference/resource-configs/snapshot_meta_column_names): {} - [invalidate_hard_deletes](/reference/resource-configs/invalidate_hard_deletes) : true | false [hard_deletes](/reference/resource-configs/hard-deletes): string ``` From a81ec90a7c818045ed36f789e07961a8bd4e01ce Mon Sep 17 00:00:00 2001 From: Mirna Wong <89008547+mirnawong1@users.noreply.github.com> Date: Thu, 28 Nov 2024 09:45:50 +0000 Subject: [PATCH 04/54] Update website/docs/docs/build/snapshots.md Co-authored-by: Grace Goheen <53586774+graciegoheen@users.noreply.github.com> --- website/docs/docs/build/snapshots.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/website/docs/docs/build/snapshots.md b/website/docs/docs/build/snapshots.md index 056ef28a17f..0fe7c6ec3fd 100644 --- a/website/docs/docs/build/snapshots.md +++ b/website/docs/docs/build/snapshots.md @@ -349,7 +349,7 @@ snapshots: -In dbt v1.9 and higher, the [`hard_deletes`](/reference/resource-configs/hard-deletes) config replaces the `invalidate_hard_deletes` config to give you more control on how to handle deleted rows from the source. The `hard_deletes` config is not a separate strategy but an additional opt-in feature that can be used with any snapshot strategy. +In dbt v1.9 and higher, the [`hard_deletes`](/reference/resource-configs/hard-deletes) config replaces the `invalidate_hard_deletes` config to give you more control on how to handle deleted rows from the source. The `hard_deletes` config is an additional opt-in feature that can be used with any snapshot strategy. The `hard_deletes` config has three options/fields: | Field | Description | From 660996cfba5604f4e75770894cdb48dcf18e6a1e Mon Sep 17 00:00:00 2001 From: Mirna Wong <89008547+mirnawong1@users.noreply.github.com> Date: Thu, 28 Nov 2024 09:46:05 +0000 Subject: [PATCH 05/54] Update website/docs/docs/build/snapshots.md Co-authored-by: Grace Goheen <53586774+graciegoheen@users.noreply.github.com> --- website/docs/docs/build/snapshots.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/website/docs/docs/build/snapshots.md b/website/docs/docs/build/snapshots.md index 0fe7c6ec3fd..04875a960f3 100644 --- a/website/docs/docs/build/snapshots.md +++ b/website/docs/docs/build/snapshots.md @@ -355,7 +355,7 @@ The `hard_deletes` config has three options/fields: | Field | Description | | --------- | ----------- | | `ignore` (default) | No action for deleted records. | -| `invalidate` | Behaves the same as the existing `invalidate_hard_deletes=true`, where deleted records are invalidated by setting `dbt_valid_to`. | +| `invalidate` | Behaves the same as the existing `invalidate_hard_deletes=true`, where deleted records are invalidated by setting `dbt_valid_to` to the current time. | | `new_record` | Tracks deleted records as new rows using the `dbt_is_deleted` [meta field](#snapshot-meta-fields) when records are deleted.| import HardDeletes from '/snippets/_hard-deletes.md'; From eecf31b03bf9388137cd2af7c89e9f106ebd6091 Mon Sep 17 00:00:00 2001 From: Mirna Wong <89008547+mirnawong1@users.noreply.github.com> Date: Thu, 28 Nov 2024 09:46:20 +0000 Subject: [PATCH 06/54] Update website/docs/docs/build/snapshots.md Co-authored-by: Grace Goheen <53586774+graciegoheen@users.noreply.github.com> --- website/docs/docs/build/snapshots.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/website/docs/docs/build/snapshots.md b/website/docs/docs/build/snapshots.md index 04875a960f3..e051bfe7a6e 100644 --- a/website/docs/docs/build/snapshots.md +++ b/website/docs/docs/build/snapshots.md @@ -356,7 +356,7 @@ The `hard_deletes` config has three options/fields: | --------- | ----------- | | `ignore` (default) | No action for deleted records. | | `invalidate` | Behaves the same as the existing `invalidate_hard_deletes=true`, where deleted records are invalidated by setting `dbt_valid_to` to the current time. | -| `new_record` | Tracks deleted records as new rows using the `dbt_is_deleted` [meta field](#snapshot-meta-fields) when records are deleted.| +| `new_record` | Tracks deleted records as new rows using the `dbt_is_deleted` [meta field](#snapshot-meta-fields) to indicate when records are in a deleted state.| import HardDeletes from '/snippets/_hard-deletes.md'; From 6dfcf9409ae3881eec855fe2bd3ba75f743e2cad Mon Sep 17 00:00:00 2001 From: Mirna Wong <89008547+mirnawong1@users.noreply.github.com> Date: Thu, 28 Nov 2024 09:46:28 +0000 Subject: [PATCH 07/54] Update website/docs/docs/build/snapshots.md Co-authored-by: Grace Goheen <53586774+graciegoheen@users.noreply.github.com> --- website/docs/docs/build/snapshots.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/website/docs/docs/build/snapshots.md b/website/docs/docs/build/snapshots.md index e051bfe7a6e..eb2250e1ad2 100644 --- a/website/docs/docs/build/snapshots.md +++ b/website/docs/docs/build/snapshots.md @@ -380,7 +380,7 @@ snapshots: -In this example, the `hard_deletes: new_record` config will add a new row for deleted records woth the `dbt_is_deleted` column set to `True`. +In this example, the `hard_deletes: new_record` config will add a new row for deleted records with the `dbt_is_deleted` column set to `True`. Any restored records are added as new rows with the `dbt_is_deleted` field set to `False`. The resulting table will look like this: From 0fc49ba5eef1ab9dc8a8b7f28ca7c7900affb9ee Mon Sep 17 00:00:00 2001 From: Mirna Wong <89008547+mirnawong1@users.noreply.github.com> Date: Thu, 28 Nov 2024 09:47:09 +0000 Subject: [PATCH 08/54] Update website/docs/docs/build/snapshots.md Co-authored-by: Grace Goheen <53586774+graciegoheen@users.noreply.github.com> --- website/docs/docs/build/snapshots.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/website/docs/docs/build/snapshots.md b/website/docs/docs/build/snapshots.md index eb2250e1ad2..ce18c60303f 100644 --- a/website/docs/docs/build/snapshots.md +++ b/website/docs/docs/build/snapshots.md @@ -402,7 +402,7 @@ This configuration is not a different strategy as described above, but is an add For this configuration to work with the `timestamp` strategy, the configured `updated_at` column must be of timestamp type. Otherwise, queries will fail due to mixing data types. -Note, in v1.9 and higher, the [`hard_deletes`](/reference/resource-configs/hard-deletes) config replaces the `invalidate_hard_deletes` config for better control over how to handle deleted rows from the source. +Note, in v1.9 and higher, setting the [`hard_deletes`](/reference/resource-configs/hard-deletes) config to `invalidate` replaces the `invalidate_hard_deletes` config for better control over how to handle deleted rows from the source. #### Example usage From 513befdf5e24c43472502fd9836f94d8754b81a6 Mon Sep 17 00:00:00 2001 From: Mirna Wong <89008547+mirnawong1@users.noreply.github.com> Date: Thu, 28 Nov 2024 09:47:16 +0000 Subject: [PATCH 09/54] Update website/docs/reference/resource-configs/snapshot_meta_column_names.md Co-authored-by: Grace Goheen <53586774+graciegoheen@users.noreply.github.com> --- .../reference/resource-configs/snapshot_meta_column_names.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/website/docs/reference/resource-configs/snapshot_meta_column_names.md b/website/docs/reference/resource-configs/snapshot_meta_column_names.md index 47e3f711624..abed760a965 100644 --- a/website/docs/reference/resource-configs/snapshot_meta_column_names.md +++ b/website/docs/reference/resource-configs/snapshot_meta_column_names.md @@ -73,7 +73,7 @@ By default, dbt snapshots use the following column names to track change history | `dbt_valid_to` | The timestamp when this row is no longer valid. | | | `dbt_scd_id` | A unique key generated for each snapshot row. | This is used internally by dbt. | | `dbt_updated_at` | The `updated_at` timestamp of the source record when this snapshot row was inserted. | This is used internally by dbt. | -| `dbt_is_deleted` | A boolean value indicating if the record has been deleted. `True` if deleted, `False` otherwise. | Added when `hard_deletes='new_record'` is configured. This is used internally by dbt | +| `dbt_is_deleted` | A boolean value indicating if the record has been deleted. `True` if deleted, `False` otherwise. | Added when `hard_deletes='new_record'` is configured. | However, these column names can be customized using the `snapshot_meta_column_names` config. From 3a1f0fc52f3800bd703e19ee2277d2aa1fae55c7 Mon Sep 17 00:00:00 2001 From: Mirna Wong <89008547+mirnawong1@users.noreply.github.com> Date: Thu, 28 Nov 2024 09:47:23 +0000 Subject: [PATCH 10/54] Update website/snippets/_hard-deletes.md Co-authored-by: Grace Goheen <53586774+graciegoheen@users.noreply.github.com> --- website/snippets/_hard-deletes.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/website/snippets/_hard-deletes.md b/website/snippets/_hard-deletes.md index b03a9ed47b8..c6127dd3c7a 100644 --- a/website/snippets/_hard-deletes.md +++ b/website/snippets/_hard-deletes.md @@ -1,7 +1,7 @@ **Use `invalidate_hard_deletes` (v1.8 and earlier) if:** -- You want to invalidate deleted rows by setting their `dbt_valid_to` timestamp to the snapshot time (implicit delete). +- You want to invalidate deleted rows by setting their `dbt_valid_to` timestamp to the current time (implicit delete). - You are working with smaller datasets where tracking deletions as a separate state is unnecessary. - Gaps in the snapshot history (missing records for deleted rows) are acceptable. From 7fdf0693be591ddf9abe5d3efd1b9ae51e7218a9 Mon Sep 17 00:00:00 2001 From: Mirna Wong <89008547+mirnawong1@users.noreply.github.com> Date: Thu, 28 Nov 2024 09:47:47 +0000 Subject: [PATCH 11/54] Update website/docs/docs/build/snapshots.md Co-authored-by: Grace Goheen <53586774+graciegoheen@users.noreply.github.com> --- website/docs/docs/build/snapshots.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/website/docs/docs/build/snapshots.md b/website/docs/docs/build/snapshots.md index ce18c60303f..95e35bfa606 100644 --- a/website/docs/docs/build/snapshots.md +++ b/website/docs/docs/build/snapshots.md @@ -437,7 +437,7 @@ Snapshot tables will be created as a clone of your sourc Starting in 1.9 or with [dbt Cloud Versionless](/docs/dbt-versions/upgrade-dbt-version-in-cloud#versionless): - These column names can be customized to your team or organizational conventions using the [`snapshot_meta_column_names`](/reference/resource-configs/snapshot_meta_column_names) config. - Use the [`dbt_valid_to_current` config](/reference/resource-configs/dbt_valid_to_current) to set a custom indicator for the value of `dbt_valid_to` in current snapshot records (like a future date such as `9999-12-31`). By default, this value is `NULL`. When set, dbt will use this specified value instead of `NULL` for `dbt_valid_to` for current records in the snapshot table. -- Use the [`hard_deletes`](/reference/resource-configs/hard-deletes) config to track deleted records as new rows with the `dbt_is_deleted` meta field when using the `hard_deletes='new_record'` field. +- Use the [`hard_deletes`](/reference/resource-configs/hard-deletes) config and `new_record` method to track deleted records as new rows with the `dbt_is_deleted` meta field. | Field | Meaning | Usage | | -------------- | ------- | ----- | From cff9dca4ac2fec92429df54a5fdc701c9bf4ae61 Mon Sep 17 00:00:00 2001 From: Mirna Wong <89008547+mirnawong1@users.noreply.github.com> Date: Thu, 28 Nov 2024 09:48:03 +0000 Subject: [PATCH 12/54] Update website/docs/reference/resource-configs/hard-deletes.md Co-authored-by: Grace Goheen <53586774+graciegoheen@users.noreply.github.com> --- website/docs/reference/resource-configs/hard-deletes.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/website/docs/reference/resource-configs/hard-deletes.md b/website/docs/reference/resource-configs/hard-deletes.md index 812be4598c1..984a5cf4e9b 100644 --- a/website/docs/reference/resource-configs/hard-deletes.md +++ b/website/docs/reference/resource-configs/hard-deletes.md @@ -70,7 +70,7 @@ The `hard_deletes` config has three options: | Field | Description | | --------- | ----------- | | `ignore` (default) | No action for deleted records. | -| `invalidate` | Behaves the same as the existing `invalidate_hard_deletes=true`, where deleted records are invalidated by setting `dbt_valid_to`. | +| `invalidate` | Behaves the same as the existing `invalidate_hard_deletes=true`, where deleted records are invalidated by setting `dbt_valid_to` to current time. | | `new_record` | Tracks deleted records as new rows using the `dbt_is_deleted` meta field when records are deleted.| ## Impact on snapshot records From 4d5edefe85313c7cff096d4dea7379ac8b3c40e6 Mon Sep 17 00:00:00 2001 From: Mirna Wong <89008547+mirnawong1@users.noreply.github.com> Date: Thu, 28 Nov 2024 09:48:20 +0000 Subject: [PATCH 13/54] Update website/docs/reference/resource-configs/hard-deletes.md Co-authored-by: Grace Goheen <53586774+graciegoheen@users.noreply.github.com> --- website/docs/reference/resource-configs/hard-deletes.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/website/docs/reference/resource-configs/hard-deletes.md b/website/docs/reference/resource-configs/hard-deletes.md index 984a5cf4e9b..0aac4977529 100644 --- a/website/docs/reference/resource-configs/hard-deletes.md +++ b/website/docs/reference/resource-configs/hard-deletes.md @@ -39,7 +39,7 @@ snapshots: unique_key='id', strategy='timestamp', updated_at='updated_at', - hard_deletes='ignore', 'invalidate', 'new_record' + hard_deletes='ignore' | 'invalidate' | 'new_record' ) }} ``` From a7f236bdc471aa3dedffc76aaecc7bf184257511 Mon Sep 17 00:00:00 2001 From: Mirna Wong <89008547+mirnawong1@users.noreply.github.com> Date: Thu, 28 Nov 2024 09:48:36 +0000 Subject: [PATCH 14/54] Update website/docs/docs/build/snapshots.md Co-authored-by: Grace Goheen <53586774+graciegoheen@users.noreply.github.com> --- website/docs/docs/build/snapshots.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/website/docs/docs/build/snapshots.md b/website/docs/docs/build/snapshots.md index 95e35bfa606..7297a8571e9 100644 --- a/website/docs/docs/build/snapshots.md +++ b/website/docs/docs/build/snapshots.md @@ -445,7 +445,7 @@ Starting in 1.9 or with [dbt Cloud Versionless](/docs/dbt-versions/upgrade-dbt-v | dbt_valid_to | The timestamp when this row became invalidated.
For current records, this is `NULL` by default or the value specified in `dbt_valid_to_current`. | The most recent snapshot record will have `dbt_valid_to` set to `NULL` or the specified value. | | dbt_scd_id | A unique key generated for each snapshotted record. | This is used internally by dbt | | dbt_updated_at | The updated_at timestamp of the source record when this snapshot row was inserted. | This is used internally by dbt | -| dbt_is_deleted | A boolean value indicating if the record has been deleted. `True` if deleted, `False` otherwise. | Added when `hard_deletes='new_record'` is configured. This is used internally by dbt | +| dbt_is_deleted | A boolean value indicating if the record is in a deleted state. `True` if deleted, `False` otherwise. | Added when `hard_deletes='new_record'` is configured. | *The timestamps used for each column are subtly different depending on the strategy you use: From 9ec76443187bfc96a024a9f45573441a517f2ed7 Mon Sep 17 00:00:00 2001 From: Mirna Wong <89008547+mirnawong1@users.noreply.github.com> Date: Thu, 28 Nov 2024 09:49:00 +0000 Subject: [PATCH 15/54] Update website/docs/docs/dbt-versions/release-notes.md Co-authored-by: Grace Goheen <53586774+graciegoheen@users.noreply.github.com> --- website/docs/docs/dbt-versions/release-notes.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/website/docs/docs/dbt-versions/release-notes.md b/website/docs/docs/dbt-versions/release-notes.md index 91e6ec0e8cb..13cd3ed9d5a 100644 --- a/website/docs/docs/dbt-versions/release-notes.md +++ b/website/docs/docs/dbt-versions/release-notes.md @@ -20,7 +20,7 @@ Release notes are grouped by month for both multi-tenant and virtual private clo ## December 2024 -- **New**: The [`hard_deletes`](/reference/resource-configs/hard-deletes) config replaces the `invalidate_hard_deletes` config to give you more control on how to handle deleted rows from the source. Supported options are `ignore`, `invalidate`, and `new_record`. +- **New**: The [`hard_deletes`](/reference/resource-configs/hard-deletes) config gives you more control on how to handle deleted rows from the source. Supported options are `ignore` (default), `invalidate` (replaces the legacy `invalidate_hard_deletes=true`), and `new_record`. ## November 2024 - **Fix**: Job environment variable overrides in credentials are now respected for Exports. Previously, they were ignored. From 454e33d2bd709dde8df52789c1cc5fabae242067 Mon Sep 17 00:00:00 2001 From: Mirna Wong <89008547+mirnawong1@users.noreply.github.com> Date: Thu, 28 Nov 2024 09:49:12 +0000 Subject: [PATCH 16/54] Update website/docs/reference/resource-configs/hard-deletes.md Co-authored-by: Grace Goheen <53586774+graciegoheen@users.noreply.github.com> --- website/docs/reference/resource-configs/hard-deletes.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/website/docs/reference/resource-configs/hard-deletes.md b/website/docs/reference/resource-configs/hard-deletes.md index 0aac4977529..52925013f36 100644 --- a/website/docs/reference/resource-configs/hard-deletes.md +++ b/website/docs/reference/resource-configs/hard-deletes.md @@ -17,7 +17,7 @@ Available from dbt v1.9 or with [Versionless](/docs/dbt-versions/upgrade-dbt-ver snapshots: - name: config: - hard_deletes: 'ignore', 'invalidate', or 'new_record' + hard_deletes: 'ignore' | 'invalidate' | 'new_record' ``` From a05f318468292dedbf3f4b1c33e1d667343628f5 Mon Sep 17 00:00:00 2001 From: Mirna Wong <89008547+mirnawong1@users.noreply.github.com> Date: Thu, 28 Nov 2024 09:49:24 +0000 Subject: [PATCH 17/54] Update website/docs/reference/resource-configs/hard-deletes.md Co-authored-by: Grace Goheen <53586774+graciegoheen@users.noreply.github.com> --- website/docs/reference/resource-configs/hard-deletes.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/website/docs/reference/resource-configs/hard-deletes.md b/website/docs/reference/resource-configs/hard-deletes.md index 52925013f36..527217a11d1 100644 --- a/website/docs/reference/resource-configs/hard-deletes.md +++ b/website/docs/reference/resource-configs/hard-deletes.md @@ -26,7 +26,7 @@ snapshots: ```yml snapshots: [](/reference/resource-configs/resource-path): - +hard_deletes: "ignore", "invalidate", or "new_record" + +hard_deletes: "ignore" | "invalidate" | "new_record" ``` From 6785baacc27062142193fa6f34fdf553f56ab4e5 Mon Sep 17 00:00:00 2001 From: Mirna Wong <89008547+mirnawong1@users.noreply.github.com> Date: Thu, 28 Nov 2024 09:50:54 +0000 Subject: [PATCH 18/54] Update website/docs/docs/dbt-versions/core-upgrade/06-upgrading-to-v1.9.md Co-authored-by: Grace Goheen <53586774+graciegoheen@users.noreply.github.com> --- .../docs/docs/dbt-versions/core-upgrade/06-upgrading-to-v1.9.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/website/docs/docs/dbt-versions/core-upgrade/06-upgrading-to-v1.9.md b/website/docs/docs/dbt-versions/core-upgrade/06-upgrading-to-v1.9.md index 3fc44bc2be2..5ca4728c972 100644 --- a/website/docs/docs/dbt-versions/core-upgrade/06-upgrading-to-v1.9.md +++ b/website/docs/docs/dbt-versions/core-upgrade/06-upgrading-to-v1.9.md @@ -65,7 +65,7 @@ Beginning in dbt Core 1.9, we've streamlined snapshot configuration and added a - Standard `schema` and `database` configs supported: Snapshots will now be consistent with other dbt resource types. You can specify where environment-aware snapshots should be stored. - Warning for incorrect `updated_at` data type: To ensure data integrity, you'll see a warning if the `updated_at` field specified in the snapshot configuration is not the proper data type or timestamp. - Set a custom current indicator for the value of `dbt_valid_to`: Use the [`dbt_valid_to_current` config](/reference/resource-configs/dbt_valid_to_current) to set a custom indicator for the value of `dbt_valid_to` in current snapshot records (like a future date). By default, this value is `NULL`. When configured, dbt will use the specified value instead of `NULL` for `dbt_valid_to` for current records in the snapshot table. -- Use the [`hard_deletes`](/reference/resource-configs/hard-deletes) configuration to track hard deletes by adding a new record when row become "deleted" in source. This config replaces the `invalidate_hard_deletes` to give you more control on how to handle deleted rows from the source. Supported fields are `ignore`, `invalidate`, and `new_record`. +- Use the [`hard_deletes`](/reference/resource-configs/hard-deletes) configuration to get more control on how to handle deleted rows from the source. Supported methods are `ignore` (default), `invalidate` (replaces legacy `invalidate_hard_deletes=true`), and `new_record`. Setting `hard_deletes='new_record'` allows you to track hard deletes by adding a new record when row becomes "deleted" in source. Read more about [Snapshots meta fields](/docs/build/snapshots#snapshot-meta-fields). From bb082352943237b21cf67b1fd454561260964358 Mon Sep 17 00:00:00 2001 From: Mirna Wong <89008547+mirnawong1@users.noreply.github.com> Date: Thu, 28 Nov 2024 11:19:06 +0000 Subject: [PATCH 19/54] Update website/docs/docs/build/snapshots.md Co-authored-by: Grace Goheen <53586774+graciegoheen@users.noreply.github.com> --- website/docs/docs/build/snapshots.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/website/docs/docs/build/snapshots.md b/website/docs/docs/build/snapshots.md index 7297a8571e9..f723ddcbbb6 100644 --- a/website/docs/docs/build/snapshots.md +++ b/website/docs/docs/build/snapshots.md @@ -389,7 +389,7 @@ The resulting table will look like this: | -- | ------ | ---------- | -------------- | ------------ | -------------- | | 1 | pending | 2024-01-01 10:47 | 2024-01-01 10:47 | 2024-01-01 11:05 | False | | 1 | shipped | 2024-01-01 11:05 | 2024-01-01 11:05 | 2024-01-01 11:20 | False | -| 1 | deleted | 2024-01-01 11:20 | 2024-01-01 11:20 | | True | +| 1 | shipped | 2024-01-01 11:20 | 2024-01-01 11:20 | | True | | 1 | restored | 2024-01-01 12:00 | 2024-01-01 12:00 | | False |
From 8ce62715f5589d9bc504d33458a9cf209e9854e3 Mon Sep 17 00:00:00 2001 From: mirnawong1 Date: Thu, 28 Nov 2024 11:19:36 +0000 Subject: [PATCH 20/54] update --- website/docs/docs/build/snapshots.md | 4 ++-- website/docs/reference/resource-configs/hard-deletes.md | 7 +++---- 2 files changed, 5 insertions(+), 6 deletions(-) diff --git a/website/docs/docs/build/snapshots.md b/website/docs/docs/build/snapshots.md index 7297a8571e9..e849ca33ac5 100644 --- a/website/docs/docs/build/snapshots.md +++ b/website/docs/docs/build/snapshots.md @@ -351,8 +351,8 @@ snapshots: In dbt v1.9 and higher, the [`hard_deletes`](/reference/resource-configs/hard-deletes) config replaces the `invalidate_hard_deletes` config to give you more control on how to handle deleted rows from the source. The `hard_deletes` config is an additional opt-in feature that can be used with any snapshot strategy. -The `hard_deletes` config has three options/fields: -| Field | Description | +The `hard_deletes` config has three methods: +| Methods | Description | | --------- | ----------- | | `ignore` (default) | No action for deleted records. | | `invalidate` | Behaves the same as the existing `invalidate_hard_deletes=true`, where deleted records are invalidated by setting `dbt_valid_to` to the current time. | diff --git a/website/docs/reference/resource-configs/hard-deletes.md b/website/docs/reference/resource-configs/hard-deletes.md index 527217a11d1..954bad35ed4 100644 --- a/website/docs/reference/resource-configs/hard-deletes.md +++ b/website/docs/reference/resource-configs/hard-deletes.md @@ -65,16 +65,15 @@ If you're updating an existing snapshot to use the `hard_deletes` config, dbt _w By default, if you don’t specify `hard_deletes`, it'll automatically default to `ignore`. Deleted rows will not be tracked and their `dbt_valid_to` column remains `NULL`. -The `hard_deletes` config has three options: +The `hard_deletes` config has three methods: -| Field | Description | +| Methods | Description | | --------- | ----------- | | `ignore` (default) | No action for deleted records. | | `invalidate` | Behaves the same as the existing `invalidate_hard_deletes=true`, where deleted records are invalidated by setting `dbt_valid_to` to current time. | | `new_record` | Tracks deleted records as new rows using the `dbt_is_deleted` meta field when records are deleted.| -## Impact on snapshot records - +## Considerations - **Backward compatibility**: The `invalidate_hard_deletes` config is still supported for existing snapshots but can't be used alongside `hard_deletes`. - **New snapshots**: For new snapshots, we recommend using `hard_deletes` instead of `invalidate_hard_deletes`. - **Migration**: If you switch an existing snapshot to use `hard_deletes` without migrating your data, you may encounter inconsistent or incorrect results, such as a mix of old and new data formats. From a70a5f1592ff4377d18c651255de4c231a30e543 Mon Sep 17 00:00:00 2001 From: mirnawong1 Date: Thu, 28 Nov 2024 12:00:09 +0000 Subject: [PATCH 21/54] grace's feedback --- website/docs/docs/build/snapshots.md | 8 ++++---- website/docs/reference/resource-configs/hard-deletes.md | 7 +++---- .../reference/resource-configs/invalidate_hard_deletes.md | 8 ++++++-- .../resource-configs/snapshot_meta_column_names.md | 5 +++-- 4 files changed, 16 insertions(+), 12 deletions(-) diff --git a/website/docs/docs/build/snapshots.md b/website/docs/docs/build/snapshots.md index 86215100138..0709224c52c 100644 --- a/website/docs/docs/build/snapshots.md +++ b/website/docs/docs/build/snapshots.md @@ -82,7 +82,7 @@ The following table outlines the configurations available for snapshots: | [updated_at](/reference/resource-configs/updated_at) | If using the `timestamp` strategy, the timestamp column to compare | Only if using the `timestamp` strategy | updated_at | | [dbt_valid_to_current](/reference/resource-configs/dbt_valid_to_current) | Set a custom indicator for the value of `dbt_valid_to` in current snapshot records (like a future date). By default, this value is `NULL`. When configured, dbt will use the specified value instead of `NULL` for `dbt_valid_to` for current records in the snapshot table.| No | string | | [snapshot_meta_column_names](/reference/resource-configs/snapshot_meta_column_names) | Customize the names of the snapshot meta fields | No | dictionary | -| [hard_deletes](/reference/resource-configs/hard-deletes) | Track hard deletes by adding a new record when row become "deleted" in source | No | string | +| [hard_deletes](/reference/resource-configs/hard-deletes) | Specify how to handle deleted rows from the source. Supported options are `ignore` (default), `invalidate` (replaces the legacy `invalidate_hard_deletes=true`), and `new_record`.| No | string | - In versions prior to v1.9, the `target_schema` (required) and `target_database` (optional) configurations defined a single schema or database to build a snapshot across users and environment. This created problems when testing or developing a snapshot, as there was no clear separation between development and production environments. In v1.9, `target_schema` became optional, allowing snapshots to be environment-aware. By default, without `target_schema` or `target_database` defined, snapshots now use the `generate_schema_name` or `generate_database_name` macros to determine where to build. Developers can still set a custom location with [`schema`](/reference/resource-configs/schema) and [`database`](/reference/resource-configs/database) configs, consistent with other resource types. @@ -390,7 +390,7 @@ The resulting table will look like this: | 1 | pending | 2024-01-01 10:47 | 2024-01-01 10:47 | 2024-01-01 11:05 | False | | 1 | shipped | 2024-01-01 11:05 | 2024-01-01 11:05 | 2024-01-01 11:20 | False | | 1 | shipped | 2024-01-01 11:20 | 2024-01-01 11:20 | | True | -| 1 | restored | 2024-01-01 12:00 | 2024-01-01 12:00 | | False | +| 1 | shipped | 2024-01-01 12:00 | 2024-01-01 12:00 | | False |
@@ -485,7 +485,7 @@ Snapshot results with `hard_deletes='new_record'`: |----|---------|------------------|------------------|------------------|------------------|----------------| | 1 | pending | 2024-01-01 10:47 | 2024-01-01 10:47 | 2024-01-01 11:05 | 2024-01-01 10:47 | False | | 1 | shipped | 2024-01-01 11:05 | 2024-01-01 11:05 | 2024-01-01 11:20 | 2024-01-01 11:05 | False | -| 1 | deleted | 2024-01-01 11:20 | 2024-01-01 11:20 | | 2024-01-01 11:20 | True | +| 1 | shipped | 2024-01-01 11:20 | 2024-01-01 11:20 | | 2024-01-01 11:20 | True | @@ -575,7 +575,7 @@ The following table outlines the configurations available for snapshots in versi | [unique_key](/reference/resource-configs/unique_key) | A column or expression for the record | Yes | id | | [check_cols](/reference/resource-configs/check_cols) | If using the `check` strategy, then the columns to check | Only if using the `check` strategy | ["status"] | | [updated_at](/reference/resource-configs/updated_at) | If using the `timestamp` strategy, the timestamp column to compare | Only if using the `timestamp` strategy | updated_at | -| [invalidate_hard_deletes](/reference/resource-configs/invalidate_hard_deletes) | Find hard deleted records in source, and set `dbt_valid_to` current time if no longer exists | No | True | +| [invalidate_hard_deletes](/reference/resource-configs/invalidate_hard_deletes) (legacy) | Find hard deleted records in source, and set `dbt_valid_to` current time if no longer exists. This is a legacy config replaced by [`hard_deletes`](/reference/resource-configs/hard-deletes) in dbt v1.9. | No | True | - A number of other configurations are also supported (e.g. `tags` and `post-hook`), check out the full list [here](/reference/snapshot-configs). - Snapshots can be configured from both your `dbt_project.yml` file and a `config` block, check out the [configuration docs](/reference/snapshot-configs) for more information. diff --git a/website/docs/reference/resource-configs/hard-deletes.md b/website/docs/reference/resource-configs/hard-deletes.md index 954bad35ed4..d9a5f5757a2 100644 --- a/website/docs/reference/resource-configs/hard-deletes.md +++ b/website/docs/reference/resource-configs/hard-deletes.md @@ -49,8 +49,7 @@ snapshots: ## Description -Use the `hard_deletes` configuration to track hard deletes by adding a new record when row become "deleted" in source. -Replaces the `invalidate_hard_deletes` config to give you more control on how to handle deleted rows from the source. +The `hard_deletes` config gives you more control on how to handle deleted rows from the source. Supported options are `ignore` (default), `invalidate` (replaces the legacy `invalidate_hard_deletes=true`), and `new_record`. import HardDeletes from '/snippets/_hard-deletes.md'; @@ -70,7 +69,7 @@ The `hard_deletes` config has three methods: | Methods | Description | | --------- | ----------- | | `ignore` (default) | No action for deleted records. | -| `invalidate` | Behaves the same as the existing `invalidate_hard_deletes=true`, where deleted records are invalidated by setting `dbt_valid_to` to current time. | +| `invalidate` | Behaves the same as the existing `invalidate_hard_deletes=true`, where deleted records are invalidated by setting `dbt_valid_to` to current time. This method replaces the `invalidate_hard_deletes` config to give you more control on how to handle deleted rows from the source. | | `new_record` | Tracks deleted records as new rows using the `dbt_is_deleted` meta field when records are deleted.| ## Considerations @@ -105,7 +104,7 @@ The resulting snapshot table contains the `hard_deletes: new_record` configurati | id | dbt_scd_id | Status | dbt_updated_at | dbt_valid_from | dbt_valid_to | dbt_is_deleted | | -- | -------------------- | ----- | -------------------- | --------------------| -------------------- | ----------- | | 1 | 60a1f1dbdf899a4dd... | pending | 2024-10-02 ... | 2024-05-19... | 2024-05-20 ... | False | -| 1 | b1885d098f8bcff51... | cancelled| 2024-10-02 ... | 2024-05-20 ... | 2024-06-03 ... | True | +| 1 | b1885d098f8bcff51... | pending | 2024-10-02 ... | 2024-05-20 ... | 2024-06-03 ... | True | | 1 | b1885d098f8bcff53... | shipped | 2024-10-02 ... | 2024-06-03 ... | | False | | 2 | b1885d098f8bcff55... | active | 2024-10-02 ... | 2024-05-19 ... | | False | diff --git a/website/docs/reference/resource-configs/invalidate_hard_deletes.md b/website/docs/reference/resource-configs/invalidate_hard_deletes.md index c3ca5bbe0d2..035f76307a2 100644 --- a/website/docs/reference/resource-configs/invalidate_hard_deletes.md +++ b/website/docs/reference/resource-configs/invalidate_hard_deletes.md @@ -1,12 +1,16 @@ --- +title: invalidate_hard_deletes (legacy) resource_types: [snapshots] description: "Invalidate_hard_deletes - Read this in-depth guide to learn about configurations in dbt." datatype: column_name +sidebar_label: invalidate_hard_deletes (legacy) --- -:::tip Use the hard_deletes config instead +:::warning This is a legacy config — Use the [`hard_deletes`](/reference/resource-configs/hard-deletes) config instead. -Note, in Versionless and dbt Core 1.9 and higher, the [`hard_deletes`](/reference/resource-configs/hard-deletes) config replaces the `invalidate_hard_deletes` config for better control over how to handle deleted rows from the source. +In Versionless and dbt Core 1.9 and higher, the [`hard_deletes`](/reference/resource-configs/hard-deletes) config replaces the `invalidate_hard_deletes` config for better control over how to handle deleted rows from the source. + +For new snapshots, set the config to `hard_deletes='invalidate'` instead of `invalidate_hard_deletes=true`. For existing snapshots, arrange an update of pre-existing tables before enabling this setting. Refer to ::: diff --git a/website/docs/reference/resource-configs/snapshot_meta_column_names.md b/website/docs/reference/resource-configs/snapshot_meta_column_names.md index abed760a965..2383d599fcd 100644 --- a/website/docs/reference/resource-configs/snapshot_meta_column_names.md +++ b/website/docs/reference/resource-configs/snapshot_meta_column_names.md @@ -110,5 +110,6 @@ The resulting snapshot table contains the configured meta column names: | id | scd_id | modified_date | start_date | end_date | is_deleted | | -- | -------------------- | -------------------- | -------------------- | -------------------- | ---------- | -| 1 | 60a1f1dbdf899a4dd... | 2024-10-02 ... | 2024-10-02 ... | 2024-10-02 ... | False | -| 2 | b1885d098f8bcff51... | 2024-10-02 ... | 2024-10-02 ... | | False | +| 1 | 60a1f1dbdf899a4dd... | 2024-10-02 ... | 2024-10-02 ... | 2024-10-03 ... | False | +| 1 | 60a1f1dbdf899a4dd... | 2024-10-03 ... | 2024-10-03 ... | | True | +| 2 | b1885d098f8bcff51... | 2024-10-02 ... | 2024-10-02 ... | | False | From ef2b2666352f7e6cbb4f1ad2feda2f191c1128b0 Mon Sep 17 00:00:00 2001 From: mirnawong1 Date: Tue, 3 Dec 2024 13:01:26 +0000 Subject: [PATCH 22/54] update table --- website/docs/docs/build/snapshots.md | 30 ++++++++++++++-------------- 1 file changed, 15 insertions(+), 15 deletions(-) diff --git a/website/docs/docs/build/snapshots.md b/website/docs/docs/build/snapshots.md index c0770d0e177..8f45363fd39 100644 --- a/website/docs/docs/build/snapshots.md +++ b/website/docs/docs/build/snapshots.md @@ -64,7 +64,7 @@ snapshots: [updated_at](/reference/resource-configs/updated_at): column_name [snapshot_meta_column_names](/reference/resource-configs/snapshot_meta_column_names): dictionary [dbt_valid_to_current](/reference/resource-configs/dbt_valid_to_current): string - [hard_deletes](/reference/resource-configs/hard-deletes): string + [hard_deletes](/reference/resource-configs/hard-deletes): ignore | invalidate | new_record ``` @@ -349,14 +349,14 @@ snapshots: -In dbt v1.9 and higher, the [`hard_deletes`](/reference/resource-configs/hard-deletes) config replaces the `invalidate_hard_deletes` config to give you more control on how to handle deleted rows from the source. The `hard_deletes` config is an additional opt-in feature that can be used with any snapshot strategy. +In dbt v1.9 and higher, the [`hard_deletes`](/reference/resource-configs/hard-deletes) config replaces the `invalidate_hard_deletes` config to give you more control on how to handle deleted rows from the source. The `hard_deletes` config is not a separate strategy but an additional opt-in feature that can be used with any snapshot strategy. -The `hard_deletes` config has three methods: -| Methods | Description | +The `hard_deletes` config has three options/fields: +| Field | Description | | --------- | ----------- | | `ignore` (default) | No action for deleted records. | -| `invalidate` | Behaves the same as the existing `invalidate_hard_deletes=true`, where deleted records are invalidated by setting `dbt_valid_to` to the current time. | -| `new_record` | Tracks deleted records as new rows using the `dbt_is_deleted` [meta field](#snapshot-meta-fields) to indicate when records are in a deleted state.| +| `invalidate` | Behaves the same as the existing `invalidate_hard_deletes=true`, where deleted records are invalidated by setting `dbt_valid_to`. | +| `new_record` | Tracks deleted records as new rows using the `dbt_is_deleted` [meta field](#snapshot-meta-fields) when records are deleted.| import HardDeletes from '/snippets/_hard-deletes.md'; @@ -380,7 +380,7 @@ snapshots: -In this example, the `hard_deletes: new_record` config will add a new row for deleted records with the `dbt_is_deleted` column set to `True`. +In this example, the `hard_deletes: new_record` config will add a new row for deleted records woth the `dbt_is_deleted` column set to `True`. Any restored records are added as new rows with the `dbt_is_deleted` field set to `False`. The resulting table will look like this: @@ -389,8 +389,8 @@ The resulting table will look like this: | -- | ------ | ---------- | -------------- | ------------ | -------------- | | 1 | pending | 2024-01-01 10:47 | 2024-01-01 10:47 | 2024-01-01 11:05 | False | | 1 | shipped | 2024-01-01 11:05 | 2024-01-01 11:05 | 2024-01-01 11:20 | False | -| 1 | shipped | 2024-01-01 11:20 | 2024-01-01 11:20 | | True | -| 1 | shipped | 2024-01-01 12:00 | 2024-01-01 12:00 | | False | +| 1 | deleted | 2024-01-01 11:20 | 2024-01-01 11:20 | 2024-01-01 12:00 | True | +| 1 | restored | 2024-01-01 12:00 | 2024-01-01 12:00 | | False | @@ -402,7 +402,7 @@ This configuration is not a different strategy as described above, but is an add For this configuration to work with the `timestamp` strategy, the configured `updated_at` column must be of timestamp type. Otherwise, queries will fail due to mixing data types. -Note, in v1.9 and higher, setting the [`hard_deletes`](/reference/resource-configs/hard-deletes) config to `invalidate` replaces the `invalidate_hard_deletes` config for better control over how to handle deleted rows from the source. +Note, in v1.9 and higher, the [`hard_deletes`](/reference/resource-configs/hard-deletes) config replaces the `invalidate_hard_deletes` config for better control over how to handle deleted rows from the source. #### Example usage @@ -437,7 +437,7 @@ Snapshot tables will be created as a clone of your sourc Starting in 1.9 or with [dbt Cloud Versionless](/docs/dbt-versions/upgrade-dbt-version-in-cloud#versionless): - These column names can be customized to your team or organizational conventions using the [`snapshot_meta_column_names`](/reference/resource-configs/snapshot_meta_column_names) config. - Use the [`dbt_valid_to_current` config](/reference/resource-configs/dbt_valid_to_current) to set a custom indicator for the value of `dbt_valid_to` in current snapshot records (like a future date such as `9999-12-31`). By default, this value is `NULL`. When set, dbt will use this specified value instead of `NULL` for `dbt_valid_to` for current records in the snapshot table. -- Use the [`hard_deletes`](/reference/resource-configs/hard-deletes) config and `new_record` method to track deleted records as new rows with the `dbt_is_deleted` meta field. +- Use the [`hard_deletes`](/reference/resource-configs/hard-deletes) config to track deleted records as new rows with the `dbt_is_deleted` meta field when using the `hard_deletes='new_record'` field. | Field | Meaning | Usage | | -------------- | ------- | ----- | @@ -445,7 +445,7 @@ Starting in 1.9 or with [dbt Cloud Versionless](/docs/dbt-versions/upgrade-dbt-v | dbt_valid_to | The timestamp when this row became invalidated.
For current records, this is `NULL` by default or the value specified in `dbt_valid_to_current`. | The most recent snapshot record will have `dbt_valid_to` set to `NULL` or the specified value. | | dbt_scd_id | A unique key generated for each snapshotted record. | This is used internally by dbt | | dbt_updated_at | The updated_at timestamp of the source record when this snapshot row was inserted. | This is used internally by dbt | -| dbt_is_deleted | A boolean value indicating if the record is in a deleted state. `True` if deleted, `False` otherwise. | Added when `hard_deletes='new_record'` is configured. | +| dbt_is_deleted | A boolean value indicating if the record has been deleted. `True` if deleted, `False` otherwise. | Added when `hard_deletes='new_record'` is configured. This is used internally by dbt | *The timestamps used for each column are subtly different depending on the strategy you use: @@ -485,7 +485,7 @@ Snapshot results with `hard_deletes='new_record'`: |----|---------|------------------|------------------|------------------|------------------|----------------| | 1 | pending | 2024-01-01 10:47 | 2024-01-01 10:47 | 2024-01-01 11:05 | 2024-01-01 10:47 | False | | 1 | shipped | 2024-01-01 11:05 | 2024-01-01 11:05 | 2024-01-01 11:20 | 2024-01-01 11:05 | False | -| 1 | shipped | 2024-01-01 11:20 | 2024-01-01 11:20 | | 2024-01-01 11:20 | True | +| 1 | deleted | 2024-01-01 11:20 | 2024-01-01 11:20 | | 2024-01-01 11:20 | True | @@ -530,7 +530,7 @@ Snapshot results: For information about configuring snapshots in dbt versions 1.8 and earlier, select **1.8** from the documentation version picker, and it will appear in this section. -To configure snapshots in versions 1.9 and later, refer to [Configuring snapshots](#configuring-snapshots). The latest versions use a more ergonomic snapshot configuration syntax that also speeds up parsing and compilation. +To configure snapshots in versions 1.9 and later, refer to [Configuring snapshots](#configuring-snapshots). The latest versions use an updated snapshot configuration syntax that optimizes performance.
@@ -575,7 +575,7 @@ The following table outlines the configurations available for snapshots in versi | [unique_key](/reference/resource-configs/unique_key) | A column or expression for the record | Yes | id | | [check_cols](/reference/resource-configs/check_cols) | If using the `check` strategy, then the columns to check | Only if using the `check` strategy | ["status"] | | [updated_at](/reference/resource-configs/updated_at) | If using the `timestamp` strategy, the timestamp column to compare | Only if using the `timestamp` strategy | updated_at | -| [invalidate_hard_deletes](/reference/resource-configs/invalidate_hard_deletes) (legacy) | Find hard deleted records in source, and set `dbt_valid_to` current time if no longer exists. This is a legacy config replaced by [`hard_deletes`](/reference/resource-configs/hard-deletes) in dbt v1.9. | No | True | +| [invalidate_hard_deletes](/reference/resource-configs/invalidate_hard_deletes) | Find hard deleted records in source, and set `dbt_valid_to` current time if no longer exists | No | True | - A number of other configurations are also supported (e.g. `tags` and `post-hook`), check out the full list [here](/reference/snapshot-configs). - Snapshots can be configured from both your `dbt_project.yml` file and a `config` block, check out the [configuration docs](/reference/snapshot-configs) for more information. From b48ed25c0da351410005bb1d89dc63b5c8b56659 Mon Sep 17 00:00:00 2001 From: mirnawong1 Date: Tue, 3 Dec 2024 15:16:30 +0000 Subject: [PATCH 23/54] add note about adapter requirement --- .../docs/docs/build/incremental-microbatch.md | 18 ++++++++++++------ 1 file changed, 12 insertions(+), 6 deletions(-) diff --git a/website/docs/docs/build/incremental-microbatch.md b/website/docs/docs/build/incremental-microbatch.md index 9055aa7650b..9138d14e516 100644 --- a/website/docs/docs/build/incremental-microbatch.md +++ b/website/docs/docs/build/incremental-microbatch.md @@ -179,12 +179,18 @@ It does not matter whether the table already contains data for that day. Given t Several configurations are relevant to microbatch models, and some are required: -| Config | Type | Description | Default | -|----------|------|---------------|---------| -| [`event_time`](/reference/resource-configs/event-time) | Column (required) | The column indicating "at what time did the row occur." Required for your microbatch model and any direct parents that should be filtered. | N/A | -| `begin` | Date (required) | The "beginning of time" for the microbatch model. This is the starting point for any initial or full-refresh builds. For example, a daily-grain microbatch model run on `2024-10-01` with `begin = '2023-10-01` will process 366 batches (it's a leap year!) plus the batch for "today." | N/A | -| `batch_size` | String (required) | The granularity of your batches. Supported values are `hour`, `day`, `month`, and `year` | N/A | -| `lookback` | Integer (optional) | Process X batches prior to the latest bookmark to capture late-arriving records. | `1` | +| Config | Description | Default | Type | Required | +|----------|---------------|---------|------|---------| +| [`event_time`](/reference/resource-configs/event-time) | The column indicating "at what time did the row occur." Required for your microbatch model and any direct parents that should be filtered. | N/A | Column | Required | +| `begin` | The "beginning of time" for the microbatch model. This is the starting point for any initial or full-refresh builds. For example, a daily-grain microbatch model run on `2024-10-01` with `begin = '2023-10-01` will process 366 batches (it's a leap year!) plus the batch for "today." | N/A | Date | Required | +| `batch_size` | The granularity of your batches. Supported values are `hour`, `day`, `month`, and `year` | N/A | String | Required | +| `lookback` | Process X batches prior to the latest bookmark to capture late-arriving records. | `1` | Integer | Optional | +| `unique_key` | A column(s) (string or array) or expression for the record. Required for the `check` strategy. | N/A | String
| Optional* | +| `partition_by` | A column(s) (string or array) or expression for the record. Required for the `check` strategy. | N/A | String | Optional* | + +***Note:** +- `unique_key` is _required_ for the check strategy when using the `dbt-postgres` adapter. +- `partition_by` is _required_ for the check strategy when using the `dbt-spark` and `dbt-bigquery` adapters. From 68d14318742cc387a9249ed11716e60b63cbf7d9 Mon Sep 17 00:00:00 2001 From: mirnawong1 Date: Tue, 3 Dec 2024 15:37:43 +0000 Subject: [PATCH 24/54] clarify microbatch --- website/docs/docs/build/incremental-microbatch.md | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/website/docs/docs/build/incremental-microbatch.md b/website/docs/docs/build/incremental-microbatch.md index 9138d14e516..0c411c02d10 100644 --- a/website/docs/docs/build/incremental-microbatch.md +++ b/website/docs/docs/build/incremental-microbatch.md @@ -189,8 +189,8 @@ Several configurations are relevant to microbatch models, and some are required: | `partition_by` | A column(s) (string or array) or expression for the record. Required for the `check` strategy. | N/A | String | Optional* | ***Note:** -- `unique_key` is _required_ for the check strategy when using the `dbt-postgres` adapter. -- `partition_by` is _required_ for the check strategy when using the `dbt-spark` and `dbt-bigquery` adapters. +- `unique_key` is _required_ for the microbatch strategy when using the `dbt-postgres` adapter. +- `partition_by` is _required_ for the microbatch strategy when using the `dbt-spark` and `dbt-bigquery` adapters. From c3a57be4c8c123f780b86888c1f8a1fc831db410 Mon Sep 17 00:00:00 2001 From: mirnawong1 Date: Tue, 3 Dec 2024 15:44:06 +0000 Subject: [PATCH 25/54] add updates --- website/docs/docs/build/incremental-microbatch.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/website/docs/docs/build/incremental-microbatch.md b/website/docs/docs/build/incremental-microbatch.md index 9055aa7650b..72d0b2fc6fa 100644 --- a/website/docs/docs/build/incremental-microbatch.md +++ b/website/docs/docs/build/incremental-microbatch.md @@ -191,7 +191,7 @@ Several configurations are relevant to microbatch models, and some are required: As a best practice, we recommend configuring `full_refresh: False` on microbatch models so that they ignore invocations with the `--full-refresh` flag. If you need to reprocess historical data, do so with a targeted backfill that specifies explicit start and end dates. ### Usage - + **You must write your model query to process (read and return) exactly one "batch" of data**. This is a simplifying assumption and a powerful one: - You don’t need to think about `is_incremental` filtering - You don't need to pick among DML strategies (upserting/merging/replacing) From 536a346cacafa84ff977c3d4bf5a63a3adf17084 Mon Sep 17 00:00:00 2001 From: Mirna Wong <89008547+mirnawong1@users.noreply.github.com> Date: Tue, 3 Dec 2024 15:45:21 +0000 Subject: [PATCH 26/54] Update website/docs/docs/build/snapshots.md Co-authored-by: Doug Beatty <44704949+dbeatty10@users.noreply.github.com> --- website/docs/docs/build/snapshots.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/website/docs/docs/build/snapshots.md b/website/docs/docs/build/snapshots.md index 8f45363fd39..94a2f9eb9ff 100644 --- a/website/docs/docs/build/snapshots.md +++ b/website/docs/docs/build/snapshots.md @@ -380,7 +380,7 @@ snapshots: -In this example, the `hard_deletes: new_record` config will add a new row for deleted records woth the `dbt_is_deleted` column set to `True`. +In this example, the `hard_deletes: new_record` config will add a new row for deleted records with the `dbt_is_deleted` column set to `True`. Any restored records are added as new rows with the `dbt_is_deleted` field set to `False`. The resulting table will look like this: From d20b1b0491949f84d8431f36b7dfa933baa48fd5 Mon Sep 17 00:00:00 2001 From: Mirna Wong <89008547+mirnawong1@users.noreply.github.com> Date: Tue, 3 Dec 2024 15:45:58 +0000 Subject: [PATCH 27/54] Update website/docs/docs/build/snapshots.md Co-authored-by: Doug Beatty <44704949+dbeatty10@users.noreply.github.com> --- website/docs/docs/build/snapshots.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/website/docs/docs/build/snapshots.md b/website/docs/docs/build/snapshots.md index 94a2f9eb9ff..5216484f7de 100644 --- a/website/docs/docs/build/snapshots.md +++ b/website/docs/docs/build/snapshots.md @@ -375,7 +375,7 @@ snapshots: unique_key: id strategy: timestamp updated_at: updated_at - hard_deletes: new_record # options are: 'ignore', 'invalidate', or 'new_record' + hard_deletes: new_record # options are: 'ignore', 'invalidate', or 'new_record' ``` From 74d04652bde6de2136708ca964a5f7704c1922d3 Mon Sep 17 00:00:00 2001 From: Mirna Wong <89008547+mirnawong1@users.noreply.github.com> Date: Tue, 3 Dec 2024 15:47:06 +0000 Subject: [PATCH 28/54] Update website/docs/docs/build/snapshots.md Co-authored-by: Doug Beatty <44704949+dbeatty10@users.noreply.github.com> --- website/docs/docs/build/snapshots.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/website/docs/docs/build/snapshots.md b/website/docs/docs/build/snapshots.md index 5216484f7de..dcefa031e15 100644 --- a/website/docs/docs/build/snapshots.md +++ b/website/docs/docs/build/snapshots.md @@ -440,7 +440,7 @@ Starting in 1.9 or with [dbt Cloud Versionless](/docs/dbt-versions/upgrade-dbt-v - Use the [`hard_deletes`](/reference/resource-configs/hard-deletes) config to track deleted records as new rows with the `dbt_is_deleted` meta field when using the `hard_deletes='new_record'` field. | Field | Meaning | Usage | -| -------------- | ------- | ----- | +| -------------- | ------- | ----- | | dbt_valid_from | The timestamp when this snapshot row was first inserted | This column can be used to order the different "versions" of a record. | | dbt_valid_to | The timestamp when this row became invalidated.
For current records, this is `NULL` by default or the value specified in `dbt_valid_to_current`. | The most recent snapshot record will have `dbt_valid_to` set to `NULL` or the specified value. | | dbt_scd_id | A unique key generated for each snapshotted record. | This is used internally by dbt | From 8e22c5f7a1b0b67965f684dd73adc365c2b3e896 Mon Sep 17 00:00:00 2001 From: Mirna Wong <89008547+mirnawong1@users.noreply.github.com> Date: Tue, 3 Dec 2024 15:47:33 +0000 Subject: [PATCH 29/54] Update website/docs/reference/resource-configs/hard-deletes.md Co-authored-by: Doug Beatty <44704949+dbeatty10@users.noreply.github.com> --- website/docs/reference/resource-configs/hard-deletes.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/website/docs/reference/resource-configs/hard-deletes.md b/website/docs/reference/resource-configs/hard-deletes.md index d9a5f5757a2..47961660979 100644 --- a/website/docs/reference/resource-configs/hard-deletes.md +++ b/website/docs/reference/resource-configs/hard-deletes.md @@ -2,7 +2,7 @@ title: hard_deletes resource_types: [snapshots] description: "Use the `hard_deletes` config to control how deleted rows are tracked in your snapshot table." -datatype: "{}" +datatype: "boolean" default_value: {ignore} id: "hard-deletes" sidebar_label: "hard_deletes" From 747049b3cc73183604fda3ec5d5d23295e464aaa Mon Sep 17 00:00:00 2001 From: mirnawong1 Date: Tue, 3 Dec 2024 16:59:45 +0000 Subject: [PATCH 30/54] dougs'feedback --- website/docs/docs/dbt-versions/release-notes.md | 2 +- website/docs/reference/resource-configs/hard-deletes.md | 2 +- website/snippets/_hard-deletes.md | 4 ++-- 3 files changed, 4 insertions(+), 4 deletions(-) diff --git a/website/docs/docs/dbt-versions/release-notes.md b/website/docs/docs/dbt-versions/release-notes.md index 13cd3ed9d5a..94e4eb45d8c 100644 --- a/website/docs/docs/dbt-versions/release-notes.md +++ b/website/docs/docs/dbt-versions/release-notes.md @@ -20,7 +20,7 @@ Release notes are grouped by month for both multi-tenant and virtual private clo ## December 2024 -- **New**: The [`hard_deletes`](/reference/resource-configs/hard-deletes) config gives you more control on how to handle deleted rows from the source. Supported options are `ignore` (default), `invalidate` (replaces the legacy `invalidate_hard_deletes=true`), and `new_record`. +- **New**: The [`hard_deletes`](/reference/resource-configs/hard-deletes) config gives you more control on how to handle deleted rows from the source. Supported options are `ignore` (default), `invalidate` (replaces the legacy `invalidate_hard_deletes=true`), and `new_record`. Note that `new_record` will create a new metadata column in the snapshot table. ## November 2024 - **Fix**: Job environment variable overrides in credentials are now respected for Exports. Previously, they were ignored. diff --git a/website/docs/reference/resource-configs/hard-deletes.md b/website/docs/reference/resource-configs/hard-deletes.md index 47961660979..49f0f756e7c 100644 --- a/website/docs/reference/resource-configs/hard-deletes.md +++ b/website/docs/reference/resource-configs/hard-deletes.md @@ -49,7 +49,7 @@ snapshots: ## Description -The `hard_deletes` config gives you more control on how to handle deleted rows from the source. Supported options are `ignore` (default), `invalidate` (replaces the legacy `invalidate_hard_deletes=true`), and `new_record`. +The `hard_deletes` config gives you more control on how to handle deleted rows from the source. Supported options are `ignore` (default), `invalidate` (replaces the legacy `invalidate_hard_deletes=true`), and `new_record`. Note that `new_record` will create a new metadata column in the snapshot table. import HardDeletes from '/snippets/_hard-deletes.md'; diff --git a/website/snippets/_hard-deletes.md b/website/snippets/_hard-deletes.md index c6127dd3c7a..59c2e3af99e 100644 --- a/website/snippets/_hard-deletes.md +++ b/website/snippets/_hard-deletes.md @@ -1,13 +1,13 @@ **Use `invalidate_hard_deletes` (v1.8 and earlier) if:** +- Gaps in the snapshot history (missing records for deleted rows) are acceptable. - You want to invalidate deleted rows by setting their `dbt_valid_to` timestamp to the current time (implicit delete). - You are working with smaller datasets where tracking deletions as a separate state is unnecessary. -- Gaps in the snapshot history (missing records for deleted rows) are acceptable. **Use `hard_deletes: new_record` (v1.9 and higher) if:** -- You want to explicitly track deletions by adding new rows with a `dbt_is_deleted` column (explicit delete). - You want to maintain continuous snapshot history without gaps. +- You want to explicitly track deletions by adding new rows with a `dbt_is_deleted` column (explicit delete). - You are working with larger datasets where explicitly tracking deleted records improves data lineage clarity. From 8d5bf831972c037a8983f10bd663d9d6f5cf8cd9 Mon Sep 17 00:00:00 2001 From: mirnawong1 Date: Tue, 3 Dec 2024 18:41:33 +0000 Subject: [PATCH 31/54] add check exmaple --- website/docs/docs/build/snapshots.md | 10 +++++++++- 1 file changed, 9 insertions(+), 1 deletion(-) diff --git a/website/docs/docs/build/snapshots.md b/website/docs/docs/build/snapshots.md index dcefa031e15..5549bfe04c6 100644 --- a/website/docs/docs/build/snapshots.md +++ b/website/docs/docs/build/snapshots.md @@ -485,7 +485,7 @@ Snapshot results with `hard_deletes='new_record'`: |----|---------|------------------|------------------|------------------|------------------|----------------| | 1 | pending | 2024-01-01 10:47 | 2024-01-01 10:47 | 2024-01-01 11:05 | 2024-01-01 10:47 | False | | 1 | shipped | 2024-01-01 11:05 | 2024-01-01 11:05 | 2024-01-01 11:20 | 2024-01-01 11:05 | False | -| 1 | deleted | 2024-01-01 11:20 | 2024-01-01 11:20 | | 2024-01-01 11:20 | True | +| 1 | deleted | 2024-01-01 11:20 | 2024-01-01 11:20 | | 2024-01-01 11:20 | True | @@ -522,6 +522,14 @@ Snapshot results: | 1 | pending | 2024-01-01 11:00 | 2024-01-01 11:30 | 2024-01-01 11:00 | | 1 | shipped | 2024-01-01 11:30 | | 2024-01-01 11:30 | +Snapshot results with `hard_deletes='new_record'`: + +| id | status | dbt_valid_from | dbt_valid_to | dbt_updated_at | dbt_is_deleted | +|----|---------|------------------|------------------|------------------|----------------| +| 1 | pending | 2024-01-01 11:00 | 2024-01-01 11:30 | 2024-01-01 11:00 | False | +| 1 | shipped | 2024-01-01 11:30 | 2024-01-01 11:40 | 2024-01-01 11:30 | False | +| 1 | deleted | 2024-01-01 11:40 | | 2024-01-01 11:40 | True | + ## Configure snapshots in versions 1.8 and earlier From 2f82a60108cf2590204bf63213c114233e01a1d0 Mon Sep 17 00:00:00 2001 From: Mirna Wong <89008547+mirnawong1@users.noreply.github.com> Date: Wed, 4 Dec 2024 10:54:34 +0000 Subject: [PATCH 32/54] Update website/docs/docs/build/incremental-microbatch.md --- website/docs/docs/build/incremental-microbatch.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/website/docs/docs/build/incremental-microbatch.md b/website/docs/docs/build/incremental-microbatch.md index 0c411c02d10..d15d5b05825 100644 --- a/website/docs/docs/build/incremental-microbatch.md +++ b/website/docs/docs/build/incremental-microbatch.md @@ -186,7 +186,7 @@ Several configurations are relevant to microbatch models, and some are required: | `batch_size` | The granularity of your batches. Supported values are `hour`, `day`, `month`, and `year` | N/A | String | Required | | `lookback` | Process X batches prior to the latest bookmark to capture late-arriving records. | `1` | Integer | Optional | | `unique_key` | A column(s) (string or array) or expression for the record. Required for the `check` strategy. | N/A | String
| Optional* | -| `partition_by` | A column(s) (string or array) or expression for the record. Required for the `check` strategy. | N/A | String | Optional* | +| `partition_by` | A column(s) (string or array) or expression for the record. | N/A | String | Optional* | ***Note:** - `unique_key` is _required_ for the microbatch strategy when using the `dbt-postgres` adapter. From 6704c0d2fe4bc127474474dbd6109d25e61071cc Mon Sep 17 00:00:00 2001 From: Mirna Wong <89008547+mirnawong1@users.noreply.github.com> Date: Wed, 4 Dec 2024 10:54:41 +0000 Subject: [PATCH 33/54] Update website/docs/docs/build/incremental-microbatch.md --- website/docs/docs/build/incremental-microbatch.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/website/docs/docs/build/incremental-microbatch.md b/website/docs/docs/build/incremental-microbatch.md index d15d5b05825..ead67b930a2 100644 --- a/website/docs/docs/build/incremental-microbatch.md +++ b/website/docs/docs/build/incremental-microbatch.md @@ -185,7 +185,7 @@ Several configurations are relevant to microbatch models, and some are required: | `begin` | The "beginning of time" for the microbatch model. This is the starting point for any initial or full-refresh builds. For example, a daily-grain microbatch model run on `2024-10-01` with `begin = '2023-10-01` will process 366 batches (it's a leap year!) plus the batch for "today." | N/A | Date | Required | | `batch_size` | The granularity of your batches. Supported values are `hour`, `day`, `month`, and `year` | N/A | String | Required | | `lookback` | Process X batches prior to the latest bookmark to capture late-arriving records. | `1` | Integer | Optional | -| `unique_key` | A column(s) (string or array) or expression for the record. Required for the `check` strategy. | N/A | String
| Optional* | +| `unique_key` | A column(s) (string or array) or expression for the record. | N/A | String
| Optional* | | `partition_by` | A column(s) (string or array) or expression for the record. | N/A | String | Optional* | ***Note:** From 0b93919eeda7028a7447678e3956bc700b4a1bbe Mon Sep 17 00:00:00 2001 From: mirnawong1 Date: Wed, 4 Dec 2024 11:29:19 +0000 Subject: [PATCH 34/54] rejig page --- .../docs/docs/build/incremental-microbatch.md | 59 +++++++++++++++---- 1 file changed, 47 insertions(+), 12 deletions(-) diff --git a/website/docs/docs/build/incremental-microbatch.md b/website/docs/docs/build/incremental-microbatch.md index ead67b930a2..377710f48f8 100644 --- a/website/docs/docs/build/incremental-microbatch.md +++ b/website/docs/docs/build/incremental-microbatch.md @@ -36,7 +36,7 @@ Each "batch" corresponds to a single bounded time period (by default, a single d This is a powerful abstraction that makes it possible for dbt to run batches [separately](#backfills), concurrently, and [retry](#retry) them independently. -### Example +## Example A `sessions` model aggregates and enriches data that comes from two other models: - `page_views` is a large, time-series table. It contains many rows, new records almost always arrive after existing ones, and existing records rarely update. It uses the `page_view_start` column as its `event_time`. @@ -175,7 +175,7 @@ It does not matter whether the table already contains data for that day. Given t -### Relevant configs +## Relevant configs Several configurations are relevant to microbatch models, and some are required: @@ -185,18 +185,53 @@ Several configurations are relevant to microbatch models, and some are required: | `begin` | The "beginning of time" for the microbatch model. This is the starting point for any initial or full-refresh builds. For example, a daily-grain microbatch model run on `2024-10-01` with `begin = '2023-10-01` will process 366 batches (it's a leap year!) plus the batch for "today." | N/A | Date | Required | | `batch_size` | The granularity of your batches. Supported values are `hour`, `day`, `month`, and `year` | N/A | String | Required | | `lookback` | Process X batches prior to the latest bookmark to capture late-arriving records. | `1` | Integer | Optional | -| `unique_key` | A column(s) (string or array) or expression for the record. | N/A | String
| Optional* | -| `partition_by` | A column(s) (string or array) or expression for the record. | N/A | String | Optional* | - -***Note:** -- `unique_key` is _required_ for the microbatch strategy when using the `dbt-postgres` adapter. -- `partition_by` is _required_ for the microbatch strategy when using the `dbt-spark` and `dbt-bigquery` adapters. +### Required configs for specific adapters +Some adapters require additional configurations for the microbatch strategy. This is because each adapter implements the microbatch strategy differently. + +The following table lists the required configurations for the specific adapters, in addition to the standard microbatch configs: + +| Adapter | `unique_key` config | `partition_by` config | +|----------|------------------|--------------------| +| [`dbt-postgres`](/reference/resource-configs/postgres-configs#incremental-materialization-strategies) | ✅ Required | N/A | +| [`dbt-spark`](/reference/resource-configs/spark-configs#incremental-models) | N/A | ✅ Required | +| [`dbt-bigquery`](/reference/resource-configs/bigquery-configs#merge-behavior-incremental-models) | N/A | ✅ Required | + +For example, if you're using `dbt-postgres`, configure `unique_key` as follows: + + + +```sql +{{ config( + materialized='incremental', + incremental_strategy='microbatch', + unique_key='sales_id', ## required for dbt-postgres + event_time='transaction_date', + begin='2023-01-01', + batch_size='day' +) }} + +select + sales_id, + transaction_date, + customer_id, + product_id, + total_amount +from {{ source('sales', 'transactions') }} + +``` + + In this example, `unique_key` is required because `dbt-postgres`' microbatch uses the `merge` strategy, which needs a `unique_key` to identify which rows in the data warehouse need to get merged. Without a `unique_key`, dbt won't be able to match rows between the incoming batch and the existing table. + + + +### Full refresh + As a best practice, we recommend configuring `full_refresh: False` on microbatch models so that they ignore invocations with the `--full-refresh` flag. If you need to reprocess historical data, do so with a targeted backfill that specifies explicit start and end dates. -### Usage +## Usage **You must write your model query to process (read and return) exactly one "batch" of data**. This is a simplifying assumption and a powerful one: - You don’t need to think about `is_incremental` filtering @@ -213,7 +248,7 @@ During standard incremental runs, dbt will process batches according to the curr **Note:** If there’s an upstream model that configures `event_time`, but you *don’t* want the reference to it to be filtered, you can specify `ref('upstream_model').render()` to opt-out of auto-filtering. This isn't generally recommended — most models that configure `event_time` are fairly large, and if the reference is not filtered, each batch will perform a full scan of this input table. -### Backfills +## Backfills Whether to fix erroneous source data or retroactively apply a change in business logic, you may need to reprocess a large amount of historical data. @@ -228,13 +263,13 @@ dbt run --event-time-start "2024-09-01" --event-time-end "2024-09-04" -### Retry +## Retry If one or more of your batches fail, you can use `dbt retry` to reprocess _only_ the failed batches. ![Partial retry](https://github.com/user-attachments/assets/f94c4797-dcc7-4875-9623-639f70c97b8f) -### Timezones +## Timezones For now, dbt assumes that all values supplied are in UTC: From 6050b24acf26522ed2bc441b4ca43a40c0c78bd3 Mon Sep 17 00:00:00 2001 From: Mirna Wong <89008547+mirnawong1@users.noreply.github.com> Date: Wed, 4 Dec 2024 12:29:02 +0000 Subject: [PATCH 35/54] Update website/docs/docs/build/incremental-microbatch.md Co-authored-by: nataliefiann <120089939+nataliefiann@users.noreply.github.com> --- website/docs/docs/build/incremental-microbatch.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/website/docs/docs/build/incremental-microbatch.md b/website/docs/docs/build/incremental-microbatch.md index 377710f48f8..023e0f25d67 100644 --- a/website/docs/docs/build/incremental-microbatch.md +++ b/website/docs/docs/build/incremental-microbatch.md @@ -223,7 +223,7 @@ from {{ source('sales', 'transactions') }} ``` - In this example, `unique_key` is required because `dbt-postgres`' microbatch uses the `merge` strategy, which needs a `unique_key` to identify which rows in the data warehouse need to get merged. Without a `unique_key`, dbt won't be able to match rows between the incoming batch and the existing table. + In this example, `unique_key` is required because `dbt-postgres` microbatch uses the `merge` strategy, which needs a `unique_key` to identify which rows in the data warehouse need to get merged. Without a `unique_key`, dbt won't be able to match rows between the incoming batch and the existing table. From 47398ebd26d69c53458873e7c3e6447f7573087c Mon Sep 17 00:00:00 2001 From: mirnawong1 Date: Wed, 4 Dec 2024 13:37:20 +0000 Subject: [PATCH 36/54] add microbatch to data platform configs --- .../reference/resource-configs/bigquery-configs.md | 5 +++-- .../reference/resource-configs/postgres-configs.md | 1 + .../reference/resource-configs/redshift-configs.md | 1 + .../reference/resource-configs/snowflake-configs.md | 12 +++++++----- .../docs/reference/resource-configs/spark-configs.md | 3 ++- 5 files changed, 14 insertions(+), 8 deletions(-) diff --git a/website/docs/reference/resource-configs/bigquery-configs.md b/website/docs/reference/resource-configs/bigquery-configs.md index 9dd39c936b6..26b7c7e951f 100644 --- a/website/docs/reference/resource-configs/bigquery-configs.md +++ b/website/docs/reference/resource-configs/bigquery-configs.md @@ -426,8 +426,9 @@ Please note that in order for policy tags to take effect, [column-level `persist The [`incremental_strategy` config](/docs/build/incremental-strategy) controls how dbt builds incremental models. dbt uses a [merge statement](https://cloud.google.com/bigquery/docs/reference/standard-sql/dml-syntax) on BigQuery to refresh incremental tables. The `incremental_strategy` config can be set to one of two values: - - `merge` (default) - - `insert_overwrite` +- `merge` (default) +- `insert_overwrite` +- [`microbatch`](/docs/build/incremental-microbatch) ### Performance and cost diff --git a/website/docs/reference/resource-configs/postgres-configs.md b/website/docs/reference/resource-configs/postgres-configs.md index f2bf90a93c0..e71c6f1484d 100644 --- a/website/docs/reference/resource-configs/postgres-configs.md +++ b/website/docs/reference/resource-configs/postgres-configs.md @@ -11,6 +11,7 @@ In dbt-postgres, the following incremental materialization strategies are suppor - `append` (default when `unique_key` is not defined) - `merge` - `delete+insert` (default when `unique_key` is defined) +- [`microbatch`](/docs/build/incremental-microbatch) ## Performance optimizations diff --git a/website/docs/reference/resource-configs/redshift-configs.md b/website/docs/reference/resource-configs/redshift-configs.md index b033cd6267e..01c9bffd055 100644 --- a/website/docs/reference/resource-configs/redshift-configs.md +++ b/website/docs/reference/resource-configs/redshift-configs.md @@ -17,6 +17,7 @@ In dbt-redshift, the following incremental materialization strategies are suppor - `append` (default when `unique_key` is not defined) - `merge` - `delete+insert` (default when `unique_key` is defined) +- [`microbatch`](/docs/build/incremental-microbatch) All of these strategies are inherited from dbt-postgres. diff --git a/website/docs/reference/resource-configs/snowflake-configs.md b/website/docs/reference/resource-configs/snowflake-configs.md index 7bef180e3d3..9a81b9485fc 100644 --- a/website/docs/reference/resource-configs/snowflake-configs.md +++ b/website/docs/reference/resource-configs/snowflake-configs.md @@ -38,11 +38,11 @@ flags: The following configurations are supported. For more information, check out the Snowflake reference for [`CREATE ICEBERG TABLE` (Snowflake as the catalog)](https://docs.snowflake.com/en/sql-reference/sql/create-iceberg-table-snowflake). -| Field | Type | Required | Description | Sample input | Note | -| --------------------- | ------ | -------- | -------------------------------------------------------------------------------------------------------------------------- | ------------------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | -| Table Format | String | Yes | Configures the objects table format. | `iceberg` | `iceberg` is the only accepted value. | +| Field | Type | Required | Description | Sample input | Note | +| ------ | ----- | -------- | ------------- | ------------ | ------ | +| Table Format | String | Yes | Configures the objects table format. | `iceberg` | `iceberg` is the only accepted value. | | External volume | String | Yes(*) | Specifies the identifier (name) of the external volume where Snowflake writes the Iceberg table's metadata and data files. | `my_s3_bucket` | *You don't need to specify this if the account, database, or schema already has an associated external volume. [More info](https://docs.snowflake.com/en/sql-reference/sql/create-iceberg-table-snowflake#:~:text=Snowflake%20Table%20Structures.-,external_volume) | -| Base location Subpath | String | No | An optional suffix to add to the `base_location` path that dbt automatically specifies. | `jaffle_marketing_folder` | We recommend that you do not specify this. Modifying this parameter results in a new Iceberg table. See [Base Location](#base-location) for more info. | +| Base location Subpath | String | No | An optional suffix to add to the `base_location` path that dbt automatically specifies. | `jaffle_marketing_folder` | We recommend that you do not specify this. Modifying this parameter results in a new Iceberg table. See [Base Location](#base-location) for more info. | ### Example configuration @@ -472,6 +472,8 @@ The [`incremental_strategy` config](/docs/build/incremental-strategy) controls h Snowflake's `merge` statement fails with a "nondeterministic merge" error if the `unique_key` specified in your model config is not actually unique. If you encounter this error, you can instruct dbt to use a two-step incremental approach by setting the `incremental_strategy` config for your model to `delete+insert`. +Snowflake also supports the [`microbatch`](/docs/build/incremental-microbatch) incremental strategy. + ## Configuring table clustering dbt supports [table clustering](https://docs.snowflake.net/manuals/user-guide/tables-clustering-keys.html) on Snowflake. To control clustering for a or incremental model, use the `cluster_by` config. When this configuration is applied, dbt will do two things: @@ -701,4 +703,4 @@ flags: ``` -
\ No newline at end of file +
diff --git a/website/docs/reference/resource-configs/spark-configs.md b/website/docs/reference/resource-configs/spark-configs.md index 3b2174b8ff5..a52fd93eace 100644 --- a/website/docs/reference/resource-configs/spark-configs.md +++ b/website/docs/reference/resource-configs/spark-configs.md @@ -37,7 +37,8 @@ For that reason, the dbt-spark plugin leans heavily on the [`incremental_strateg - **`append`** (default): Insert new records without updating or overwriting any existing data. - **`insert_overwrite`**: If `partition_by` is specified, overwrite partitions in the with new data. If no `partition_by` is specified, overwrite the entire table with new data. - **`merge`** (Delta, Iceberg and Hudi file format only): Match records based on a `unique_key`; update old records, insert new ones. (If no `unique_key` is specified, all new data is inserted, similar to `append`.) - +- `microbatch` Implements the [microbatch strategy](/docs/build/incremental-microbatch) using `event_time` to define time-based ranges for filtering data. + Each of these strategies has its pros and cons, which we'll discuss below. As with any model config, `incremental_strategy` may be specified in `dbt_project.yml` or within a model file's `config()` block. ### The `append` strategy From dcaa0cf8c949056b76d1001e23e9b7102bca1058 Mon Sep 17 00:00:00 2001 From: mirnawong1 Date: Wed, 4 Dec 2024 14:01:27 +0000 Subject: [PATCH 37/54] add link to migration --- website/docs/reference/resource-configs/hard-deletes.md | 2 +- .../docs/reference/resource-configs/invalidate_hard_deletes.md | 2 +- 2 files changed, 2 insertions(+), 2 deletions(-) diff --git a/website/docs/reference/resource-configs/hard-deletes.md b/website/docs/reference/resource-configs/hard-deletes.md index 49f0f756e7c..ef6d70f3e6f 100644 --- a/website/docs/reference/resource-configs/hard-deletes.md +++ b/website/docs/reference/resource-configs/hard-deletes.md @@ -57,7 +57,7 @@ import HardDeletes from '/snippets/_hard-deletes.md'; :::warning -If you're updating an existing snapshot to use the `hard_deletes` config, dbt _will not_ handle migrations automatically. We recommend either only using these settings for net-new snapshots, or arranging an update of pre-existing tables before enabling this setting. +If you're updating an existing snapshot to use the `hard_deletes` config, dbt _will not_ handle migrations automatically. We recommend either only using these settings for net-new snapshots, or [arranging an update](/reference/snapshot-configs#snapshot-configuration-migration) of pre-existing tables before enabling this setting. ::: ## Default diff --git a/website/docs/reference/resource-configs/invalidate_hard_deletes.md b/website/docs/reference/resource-configs/invalidate_hard_deletes.md index 035f76307a2..67123487fa1 100644 --- a/website/docs/reference/resource-configs/invalidate_hard_deletes.md +++ b/website/docs/reference/resource-configs/invalidate_hard_deletes.md @@ -10,7 +10,7 @@ sidebar_label: invalidate_hard_deletes (legacy) In Versionless and dbt Core 1.9 and higher, the [`hard_deletes`](/reference/resource-configs/hard-deletes) config replaces the `invalidate_hard_deletes` config for better control over how to handle deleted rows from the source. -For new snapshots, set the config to `hard_deletes='invalidate'` instead of `invalidate_hard_deletes=true`. For existing snapshots, arrange an update of pre-existing tables before enabling this setting. Refer to +For new snapshots, set the config to `hard_deletes='invalidate'` instead of `invalidate_hard_deletes=true`. For existing snapshots, [arrange an update](/reference/snapshot-configs#snapshot-configuration-migration) of pre-existing tables before enabling this setting. Refer to ::: From 8ab28c18cbaa17c417e7df066d27b69372ab23ef Mon Sep 17 00:00:00 2001 From: mirnawong1 Date: Wed, 4 Dec 2024 16:59:56 +0000 Subject: [PATCH 38/54] add batch properties --- .../reference/dbt-jinja-functions/model.md | 47 ++++++++++++++++++- 1 file changed, 45 insertions(+), 2 deletions(-) diff --git a/website/docs/reference/dbt-jinja-functions/model.md b/website/docs/reference/dbt-jinja-functions/model.md index 516981e11e3..317ebed3437 100644 --- a/website/docs/reference/dbt-jinja-functions/model.md +++ b/website/docs/reference/dbt-jinja-functions/model.md @@ -20,9 +20,9 @@ To view the contents of `model` for a given model: - + -If you're using the CLI, use [log()](/reference/dbt-jinja-functions/log) to print the full contents: +If you're using the Command line interface (CLI), use [log()](/reference/dbt-jinja-functions/log) to print the full contents: ```jinja {{ log(model, info=True) }} @@ -42,6 +42,49 @@ If you're using the CLI, use [log()](/reference/dbt-jinja-functions/log) to prin +## Batch properties for microbatch models + +From dbt Core v1.9, the model object includes a `batch` property (`model.batch`), which provides details about the current batch when executing an [incremental microbatch](/docs/build/incremental-microbatch) model. This property is only populated during the batch execution of a microbatch model. + +The following table describes the properties of the `batch` object. Note that dbt appends the property to the `model` and `batch` objects. + +| Property | Description | Example | +| -------- | ----------- | ------- | +| `id` | The unique identifier for the batch within the context of the microbatch model. | `model.batch.id` | +| `event_time_start` | The start time of the batch's [`event_time`](/reference/resource-configs/event-time) filter (inclusive). | `model.batch.event_time_start` | +| `event_time_end` | The end time of the batch's `event_time` filter (exclusive). | `model.batch.event_time_end` | + +### Usage notes + +`model.batch` is only available during the execution of a microbatch model batch. Outside of the microbatch execution, `model.batch` is `None`, and its sub-properties aren't accessible. + +#### Example: Safeguard access to batch properties + +We recommend to always check if `model.batch` is populated before accessing its properties. Use an `if` statement to ensure safe access to batch properties: + +```jinja +{% if model.batch %} + {{ log(model.batch.id) }} # Log the batch ID # + {{ log(model.batch.event_time_start) }} # Log the start time of the batch # + {{ log(model.batch.event_time_end) }} # Log the end time of the batch # +{% endif %} +``` + +In this example, the `if model.batch` statement makes sure that the code only runs during a batch execution. `log()` is used to print the `batch` properties for debugging. + +#### Example: Log batch details + +This is a practical example of how you might use `model.batch` in a microbatch model to log batch details for the `batch.id`: + +```jinja +{% if model.batch %} + {{ log("Processing batch with ID: " ~ model.batch.id, info=True) }} + {{ log("Batch event time range: " ~ model.batch.event_time_start ~ " to " ~ model.batch.event_time_end, info=True) }} +{% endif %} +``` +In this example, the `if model.batch` statement makes sure that the code only runs during a batch execution. `log()` is used to print the `batch` properties for debugging. + + ## Model structure and JSON schema To view the structure of `models` and their definitions: From 92f7d4879852754de2e80f43b62fca07949aef69 Mon Sep 17 00:00:00 2001 From: mirnawong1 Date: Wed, 4 Dec 2024 17:01:08 +0000 Subject: [PATCH 39/54] add --- website/docs/reference/dbt-jinja-functions/model.md | 1 - 1 file changed, 1 deletion(-) diff --git a/website/docs/reference/dbt-jinja-functions/model.md b/website/docs/reference/dbt-jinja-functions/model.md index 317ebed3437..94ed34a1070 100644 --- a/website/docs/reference/dbt-jinja-functions/model.md +++ b/website/docs/reference/dbt-jinja-functions/model.md @@ -84,7 +84,6 @@ This is a practical example of how you might use `model.batch` in a microbatch m ``` In this example, the `if model.batch` statement makes sure that the code only runs during a batch execution. `log()` is used to print the `batch` properties for debugging. - ## Model structure and JSON schema To view the structure of `models` and their definitions: From 0fc6b6dfa5d2c6f028953f278044ec6330f65d89 Mon Sep 17 00:00:00 2001 From: Mirna Wong <89008547+mirnawong1@users.noreply.github.com> Date: Wed, 4 Dec 2024 17:08:32 +0000 Subject: [PATCH 40/54] Update website/docs/reference/dbt-jinja-functions/model.md --- website/docs/reference/dbt-jinja-functions/model.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/website/docs/reference/dbt-jinja-functions/model.md b/website/docs/reference/dbt-jinja-functions/model.md index 94ed34a1070..2b8633af226 100644 --- a/website/docs/reference/dbt-jinja-functions/model.md +++ b/website/docs/reference/dbt-jinja-functions/model.md @@ -60,7 +60,7 @@ The following table describes the properties of the `batch` object. Note that db #### Example: Safeguard access to batch properties -We recommend to always check if `model.batch` is populated before accessing its properties. Use an `if` statement to ensure safe access to batch properties: +We recommend to always check if `model.batch` is populated before accessing its properties. To do this, use an `if` statement for safe access to `batch` properties: ```jinja {% if model.batch %} From 019a8c371f66c7fae03736ed398f783f6d85df63 Mon Sep 17 00:00:00 2001 From: Matt Shaver <60105315+matthewshaver@users.noreply.github.com> Date: Wed, 4 Dec 2024 16:34:03 -0500 Subject: [PATCH 41/54] Updating default timeout --- website/docs/docs/core/connect-data-platform/redshift-setup.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/website/docs/docs/core/connect-data-platform/redshift-setup.md b/website/docs/docs/core/connect-data-platform/redshift-setup.md index ce3e8658045..4c00558d782 100644 --- a/website/docs/docs/core/connect-data-platform/redshift-setup.md +++ b/website/docs/docs/core/connect-data-platform/redshift-setup.md @@ -31,7 +31,7 @@ import SetUpPages from '/snippets/_setup-pages-intro.md'; | `port` | 5439 | | | `dbname` | my_db | Database name| | `schema` | my_schema | Schema name| -| `connect_timeout` | `None` or 30 | Number of seconds before connection times out| +| `connect_timeout` | 30 | Number of seconds before connection times out. Default is `None`| | `sslmode` | prefer | optional, set the sslmode to connect to the database. Default prefer, which will use 'verify-ca' to connect. For more information on `sslmode`, see Redshift note below| | `role` | None | Optional, user identifier of the current session| | `autocreate` | false | Optional, default false. Creates user if they do not exist | From 30f607cb7dda9da33f5982246e5407ac079d6ccc Mon Sep 17 00:00:00 2001 From: Matt Shaver <60105315+matthewshaver@users.noreply.github.com> Date: Wed, 4 Dec 2024 16:48:12 -0500 Subject: [PATCH 42/54] Removing old snippet --- website/docs/docs/build/hooks-operations.md | 2 -- website/docs/reference/dbt-jinja-functions/this.md | 2 -- .../docs/reference/project-configs/on-run-start-on-run-end.md | 2 -- website/docs/reference/resource-configs/pre-hook-post-hook.md | 2 -- website/snippets/hooks-to-grants.md | 3 --- 5 files changed, 11 deletions(-) delete mode 100644 website/snippets/hooks-to-grants.md diff --git a/website/docs/docs/build/hooks-operations.md b/website/docs/docs/build/hooks-operations.md index 6cec2a673c0..842d3fb99a3 100644 --- a/website/docs/docs/build/hooks-operations.md +++ b/website/docs/docs/build/hooks-operations.md @@ -40,8 +40,6 @@ Hooks are snippets of SQL that are executed at different times: Hooks are a more-advanced capability that enable you to run custom SQL, and leverage database-specific actions, beyond what dbt makes available out-of-the-box with standard materializations and configurations. - - If (and only if) you can't leverage the [`grants` resource-config](/reference/resource-configs/grants), you can use `post-hook` to perform more advanced workflows: * Need to apply `grants` in a more complex way, which the dbt Core `grants` config doesn't (yet) support. diff --git a/website/docs/reference/dbt-jinja-functions/this.md b/website/docs/reference/dbt-jinja-functions/this.md index f9f2961b08f..7d358cb6299 100644 --- a/website/docs/reference/dbt-jinja-functions/this.md +++ b/website/docs/reference/dbt-jinja-functions/this.md @@ -20,8 +20,6 @@ meta: ## Examples - - ### Configuring incremental models diff --git a/website/docs/reference/project-configs/on-run-start-on-run-end.md b/website/docs/reference/project-configs/on-run-start-on-run-end.md index 74557839f11..347ce54ab63 100644 --- a/website/docs/reference/project-configs/on-run-start-on-run-end.md +++ b/website/docs/reference/project-configs/on-run-start-on-run-end.md @@ -27,8 +27,6 @@ A SQL statement (or list of SQL statements) to be run at the start or end of the ## Examples - - ### Grant privileges on all schemas that dbt uses at the end of a run This leverages the [schemas](/reference/dbt-jinja-functions/schemas) variable that is only available in an `on-run-end` hook. diff --git a/website/docs/reference/resource-configs/pre-hook-post-hook.md b/website/docs/reference/resource-configs/pre-hook-post-hook.md index bd01a7be840..ee3c81b0fd6 100644 --- a/website/docs/reference/resource-configs/pre-hook-post-hook.md +++ b/website/docs/reference/resource-configs/pre-hook-post-hook.md @@ -160,8 +160,6 @@ import SQLCompilationError from '/snippets/_render-method.md'; ## Examples - - ### [Redshift] Unload one model to S3 diff --git a/website/snippets/hooks-to-grants.md b/website/snippets/hooks-to-grants.md deleted file mode 100644 index d7586ec53ca..00000000000 --- a/website/snippets/hooks-to-grants.md +++ /dev/null @@ -1,3 +0,0 @@ - -In older versions of dbt, the most common use of `post-hook` was to execute `grant` statements, to apply database permissions to models right after creating them. We recommend using the [`grants` resource config](/reference/resource-configs/grants) instead, in order to automatically apply grants when your dbt model runs. - From c74d9be1a499a0a63a17af30b0689af4bf5cb59e Mon Sep 17 00:00:00 2001 From: Mirna Wong <89008547+mirnawong1@users.noreply.github.com> Date: Thu, 5 Dec 2024 09:38:48 +0000 Subject: [PATCH 43/54] Update website/docs/reference/resource-configs/bigquery-configs.md Co-authored-by: Matt Shaver <60105315+matthewshaver@users.noreply.github.com> --- website/docs/reference/resource-configs/bigquery-configs.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/website/docs/reference/resource-configs/bigquery-configs.md b/website/docs/reference/resource-configs/bigquery-configs.md index 26b7c7e951f..69e144b6162 100644 --- a/website/docs/reference/resource-configs/bigquery-configs.md +++ b/website/docs/reference/resource-configs/bigquery-configs.md @@ -425,7 +425,7 @@ Please note that in order for policy tags to take effect, [column-level `persist The [`incremental_strategy` config](/docs/build/incremental-strategy) controls how dbt builds incremental models. dbt uses a [merge statement](https://cloud.google.com/bigquery/docs/reference/standard-sql/dml-syntax) on BigQuery to refresh incremental tables. -The `incremental_strategy` config can be set to one of two values: +The `incremental_strategy` config can be set to one of the following values: - `merge` (default) - `insert_overwrite` - [`microbatch`](/docs/build/incremental-microbatch) From 2f6acf964c74c3275ce075973104857826222614 Mon Sep 17 00:00:00 2001 From: Mirna Wong <89008547+mirnawong1@users.noreply.github.com> Date: Thu, 5 Dec 2024 09:39:08 +0000 Subject: [PATCH 44/54] Update website/docs/reference/resource-configs/snowflake-configs.md Co-authored-by: Matt Shaver <60105315+matthewshaver@users.noreply.github.com> --- .../docs/reference/resource-configs/snowflake-configs.md | 6 ++++++ 1 file changed, 6 insertions(+) diff --git a/website/docs/reference/resource-configs/snowflake-configs.md b/website/docs/reference/resource-configs/snowflake-configs.md index 9a81b9485fc..73b34d1fc3b 100644 --- a/website/docs/reference/resource-configs/snowflake-configs.md +++ b/website/docs/reference/resource-configs/snowflake-configs.md @@ -470,6 +470,12 @@ In this example, you can set up a query tag to be applied to every query with th The [`incremental_strategy` config](/docs/build/incremental-strategy) controls how dbt builds incremental models. By default, dbt will use a [merge statement](https://docs.snowflake.net/manuals/sql-reference/sql/merge.html) on Snowflake to refresh incremental tables. +Snowflake supports the following incremental strategies: +- Merge (default) +- Append +- Delete+insert +- [`microbatch`](/docs/build/incremental-microbatch) + Snowflake's `merge` statement fails with a "nondeterministic merge" error if the `unique_key` specified in your model config is not actually unique. If you encounter this error, you can instruct dbt to use a two-step incremental approach by setting the `incremental_strategy` config for your model to `delete+insert`. Snowflake also supports the [`microbatch`](/docs/build/incremental-microbatch) incremental strategy. From 1236efc0959cebf953e705123d72ea9c7fed4688 Mon Sep 17 00:00:00 2001 From: Mirna Wong <89008547+mirnawong1@users.noreply.github.com> Date: Thu, 5 Dec 2024 09:39:40 +0000 Subject: [PATCH 45/54] Update website/docs/reference/resource-configs/snowflake-configs.md Co-authored-by: Matt Shaver <60105315+matthewshaver@users.noreply.github.com> --- website/docs/reference/resource-configs/snowflake-configs.md | 1 - 1 file changed, 1 deletion(-) diff --git a/website/docs/reference/resource-configs/snowflake-configs.md b/website/docs/reference/resource-configs/snowflake-configs.md index 73b34d1fc3b..d576b195b65 100644 --- a/website/docs/reference/resource-configs/snowflake-configs.md +++ b/website/docs/reference/resource-configs/snowflake-configs.md @@ -478,7 +478,6 @@ Snowflake supports the following incremental strategies: Snowflake's `merge` statement fails with a "nondeterministic merge" error if the `unique_key` specified in your model config is not actually unique. If you encounter this error, you can instruct dbt to use a two-step incremental approach by setting the `incremental_strategy` config for your model to `delete+insert`. -Snowflake also supports the [`microbatch`](/docs/build/incremental-microbatch) incremental strategy. ## Configuring table clustering From 6a1e3267a9be56ad9152451da2f4b239d77dbbc7 Mon Sep 17 00:00:00 2001 From: Mirna Wong <89008547+mirnawong1@users.noreply.github.com> Date: Thu, 5 Dec 2024 09:42:55 +0000 Subject: [PATCH 46/54] Update website/docs/reference/dbt-jinja-functions/model.md Co-authored-by: Matt Shaver <60105315+matthewshaver@users.noreply.github.com> --- website/docs/reference/dbt-jinja-functions/model.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/website/docs/reference/dbt-jinja-functions/model.md b/website/docs/reference/dbt-jinja-functions/model.md index 2b8633af226..3716ab7dd57 100644 --- a/website/docs/reference/dbt-jinja-functions/model.md +++ b/website/docs/reference/dbt-jinja-functions/model.md @@ -22,7 +22,7 @@ To view the contents of `model` for a given model: -If you're using the Command line interface (CLI), use [log()](/reference/dbt-jinja-functions/log) to print the full contents: +If you're using the command line interface (CLI), use [log()](/reference/dbt-jinja-functions/log) to print the full contents: ```jinja {{ log(model, info=True) }} From 831f26a37fb23c63f17c309abbe81afaeac7199c Mon Sep 17 00:00:00 2001 From: Mirna Wong <89008547+mirnawong1@users.noreply.github.com> Date: Thu, 5 Dec 2024 09:43:01 +0000 Subject: [PATCH 47/54] Update website/docs/reference/dbt-jinja-functions/model.md Co-authored-by: Matt Shaver <60105315+matthewshaver@users.noreply.github.com> --- website/docs/reference/dbt-jinja-functions/model.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/website/docs/reference/dbt-jinja-functions/model.md b/website/docs/reference/dbt-jinja-functions/model.md index 3716ab7dd57..ea56026e935 100644 --- a/website/docs/reference/dbt-jinja-functions/model.md +++ b/website/docs/reference/dbt-jinja-functions/model.md @@ -44,7 +44,7 @@ If you're using the command line interface (CLI), use [log()](/reference/dbt-jin ## Batch properties for microbatch models -From dbt Core v1.9, the model object includes a `batch` property (`model.batch`), which provides details about the current batch when executing an [incremental microbatch](/docs/build/incremental-microbatch) model. This property is only populated during the batch execution of a microbatch model. +Starting in dbt Core v1.9, the model object includes a `batch` property (`model.batch`), which provides details about the current batch when executing an [incremental microbatch](/docs/build/incremental-microbatch) model. This property is only populated during the batch execution of a microbatch model. The following table describes the properties of the `batch` object. Note that dbt appends the property to the `model` and `batch` objects. From c2ccfbe052c2573f6d8673812185c7ea037b2baa Mon Sep 17 00:00:00 2001 From: Mirna Wong <89008547+mirnawong1@users.noreply.github.com> Date: Thu, 5 Dec 2024 09:43:13 +0000 Subject: [PATCH 48/54] Update website/docs/reference/dbt-jinja-functions/model.md Co-authored-by: Matt Shaver <60105315+matthewshaver@users.noreply.github.com> --- website/docs/reference/dbt-jinja-functions/model.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/website/docs/reference/dbt-jinja-functions/model.md b/website/docs/reference/dbt-jinja-functions/model.md index ea56026e935..0ad475a5cf5 100644 --- a/website/docs/reference/dbt-jinja-functions/model.md +++ b/website/docs/reference/dbt-jinja-functions/model.md @@ -58,7 +58,7 @@ The following table describes the properties of the `batch` object. Note that db `model.batch` is only available during the execution of a microbatch model batch. Outside of the microbatch execution, `model.batch` is `None`, and its sub-properties aren't accessible. -#### Example: Safeguard access to batch properties +#### Example of safeguarding access to batch properties We recommend to always check if `model.batch` is populated before accessing its properties. To do this, use an `if` statement for safe access to `batch` properties: From 09035ec43c93a82dcb53b37c84a9cbbb5ad875b1 Mon Sep 17 00:00:00 2001 From: Mirna Wong <89008547+mirnawong1@users.noreply.github.com> Date: Thu, 5 Dec 2024 09:43:21 +0000 Subject: [PATCH 49/54] Update website/docs/reference/dbt-jinja-functions/model.md Co-authored-by: Matt Shaver <60105315+matthewshaver@users.noreply.github.com> --- website/docs/reference/dbt-jinja-functions/model.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/website/docs/reference/dbt-jinja-functions/model.md b/website/docs/reference/dbt-jinja-functions/model.md index 0ad475a5cf5..b0995ff958c 100644 --- a/website/docs/reference/dbt-jinja-functions/model.md +++ b/website/docs/reference/dbt-jinja-functions/model.md @@ -72,7 +72,7 @@ We recommend to always check if `model.batch` is populated before accessing its In this example, the `if model.batch` statement makes sure that the code only runs during a batch execution. `log()` is used to print the `batch` properties for debugging. -#### Example: Log batch details +#### Example of log batch details This is a practical example of how you might use `model.batch` in a microbatch model to log batch details for the `batch.id`: From f042cda4eefadbd358fe93e81cb1d17e4050b4e9 Mon Sep 17 00:00:00 2001 From: Mirna Wong <89008547+mirnawong1@users.noreply.github.com> Date: Thu, 5 Dec 2024 16:14:36 +0000 Subject: [PATCH 50/54] Update exposures.md add link --- website/docs/docs/build/exposures.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/website/docs/docs/build/exposures.md b/website/docs/docs/build/exposures.md index 1a85d5fb415..16dfd0e5f73 100644 --- a/website/docs/docs/build/exposures.md +++ b/website/docs/docs/build/exposures.md @@ -69,7 +69,7 @@ dbt test -s +exposure:weekly_jaffle_report ``` -When we generate the dbt Explorer site, you'll see the exposure appear: +When we generate the [dbt Explorer site](/docs/collaborate/explore-projects), you'll see the exposure appear: From e8bf470c9ce3b1a062dbea9146608e00d5abf638 Mon Sep 17 00:00:00 2001 From: Mirna Wong <89008547+mirnawong1@users.noreply.github.com> Date: Thu, 5 Dec 2024 16:27:23 +0000 Subject: [PATCH 51/54] Update hard-deletes.md change link and to latest release --- website/docs/reference/resource-configs/hard-deletes.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/website/docs/reference/resource-configs/hard-deletes.md b/website/docs/reference/resource-configs/hard-deletes.md index ef6d70f3e6f..50c8046f4e1 100644 --- a/website/docs/reference/resource-configs/hard-deletes.md +++ b/website/docs/reference/resource-configs/hard-deletes.md @@ -8,7 +8,7 @@ id: "hard-deletes" sidebar_label: "hard_deletes" --- -Available from dbt v1.9 or with [Versionless](/docs/dbt-versions/upgrade-dbt-version-in-cloud#versionless) dbt Cloud. +Available from dbt v1.9 or with [dbt Cloud "Latest" release track](/docs/dbt-versions/cloud-release-tracks). From 9d4107ae9ef5ce89037b92cf10e6d9a6d29e2b97 Mon Sep 17 00:00:00 2001 From: Matt Shaver <60105315+matthewshaver@users.noreply.github.com> Date: Thu, 5 Dec 2024 11:50:50 -0500 Subject: [PATCH 52/54] Fixing redirect syntax (#6596) ## What are you changing in this pull request and why? Fixing redirect syntax ## Checklist - [ ] I have reviewed the [Content style guide](https://github.com/dbt-labs/docs.getdbt.com/blob/current/contributing/content-style-guide.md) so my content adheres to these guidelines. - [ ] The topic I'm writing about is for specific dbt version(s) and I have versioned it according to the [version a whole page](https://github.com/dbt-labs/docs.getdbt.com/blob/current/contributing/single-sourcing-content.md#adding-a-new-version) and/or [version a block of content](https://github.com/dbt-labs/docs.getdbt.com/blob/current/contributing/single-sourcing-content.md#versioning-blocks-of-content) guidelines. - [ ] I have added checklist item(s) to this list for anything anything that needs to happen before this PR is merged, such as "needs technical review" or "change base branch." - [ ] The content in this PR requires a dbt release note, so I added one to the [release notes page](https://docs.getdbt.com/docs/dbt-versions/dbt-cloud-release-notes). --- website/vercel.json | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/website/vercel.json b/website/vercel.json index 57d32cde718..fa90697a517 100644 --- a/website/vercel.json +++ b/website/vercel.json @@ -103,8 +103,8 @@ "permanent": true }, { - "source": "docs/dbt-versions/versionless-cloud", - "destination": "docs/dbt-versions/cloud-release-tracks", + "source": "/docs/dbt-versions/versionless-cloud", + "destination": "/docs/dbt-versions/cloud-release-tracks", "permanent": true }, { From c6babc0cad6b9b08f073c66978f5f5aaccb35c8a Mon Sep 17 00:00:00 2001 From: mirnawong1 Date: Thu, 5 Dec 2024 17:56:11 +0000 Subject: [PATCH 53/54] update link" --- website/snippets/access_url.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/website/snippets/access_url.md b/website/snippets/access_url.md index 4fb7aa776ae..90a9238618a 100644 --- a/website/snippets/access_url.md +++ b/website/snippets/access_url.md @@ -1 +1 @@ -The following steps use `YOUR_AUTH0_URI` and `YOUR_AUTH0_ENTITYID`, which need to be replaced with the [appropriate Auth0 SSO URI and Auth0 Entity ID](/docs/cloud/manage-access/set-up-sso-saml-2.0#auth0-multi-tenant-uris) for your region. +The following steps use `YOUR_AUTH0_URI` and `YOUR_AUTH0_ENTITYID`, which need to be replaced with the [appropriate Auth0 SSO URI and Auth0 Entity ID](#auth0-uris) for your region. From 82e807b26463f1caffbc07ef80ef9ae6de753e9d Mon Sep 17 00:00:00 2001 From: mirnawong1 Date: Thu, 5 Dec 2024 18:00:59 +0000 Subject: [PATCH 54/54] remove space --- website/snippets/access_url.md | 2 -- 1 file changed, 2 deletions(-) diff --git a/website/snippets/access_url.md b/website/snippets/access_url.md index 89924d00513..90a9238618a 100644 --- a/website/snippets/access_url.md +++ b/website/snippets/access_url.md @@ -1,3 +1 @@ The following steps use `YOUR_AUTH0_URI` and `YOUR_AUTH0_ENTITYID`, which need to be replaced with the [appropriate Auth0 SSO URI and Auth0 Entity ID](#auth0-uris) for your region. - -