-
Notifications
You must be signed in to change notification settings - Fork 56
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add delete+insert
materialization
#62
Add delete+insert
materialization
#62
Conversation
Add a reference towards https://docs.getdbt.com/docs/building-a-dbt-project/building-models/configuring-incremental-models and state the options for the incremental strategy |
{% do return(strategy) %} | ||
{% endmacro %} | ||
|
||
{% macro dbt_trino_get_incremental_sql(strategy, tmp_relation, target_relation, unique_key, dest_columns) %} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is there any specific reason why the new methods are prefixed with "dbt_trino_" ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I based myself on the snowflake implementation. I think their reasoning is that macro's are global and prefixing it will avoid name clashes from happening, similar like java namespaces.
{% do return(get_delete_insert_merge_sql(target_relation, tmp_relation, unique_key, dest_columns)) %} | ||
{% elif strategy == 'merge' %} | ||
{#-- Not yet supported by trino --#} | ||
{% do return(get_merge_sql(target_relation, tmp_relation, unique_key, dest_columns)) %} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this a valid strategy for Trino at the moment?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note that this strategy is currently disabled:
{% set invalid_strategy_msg -%}
Invalid incremental strategy provided: {{ strategy }}
Expected one of: 'append', 'delete+insert'
{%- endset %}
{% if strategy not in ['append', 'delete+insert'] %}
{% do exceptions.raise_compiler_error(invalid_strategy_msg) %}
{% endif %}
What is needed to push forward this MR? |
I actually had a stab at the integration test for this feature with postgres but the query is not supported
Next i will try with delta lake + minio. I will probably create a separate PR to add delta + minio first. |
Today DELETE on rdbms connectors is only supported when the predicate can be entirely pushed down to the connector. |
Indeed, that was also my conclusion. I did a basic test with the Delta connector and there DELETE with subqueries are supported.
|
Probably we would need to switch from PostgreSQL to Minio for testing. |
Yes indeed, will post a PR with delta + minio + hms support later today. |
@@ -0,0 +1,71 @@ | |||
{% macro dbt_trino_validate_get_incremental_strategy(config) %} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
folder incremental
not needed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it is a little cleaner to split it to keep the logic in incremental light.
"seed": self.column_type_override(), | ||
} | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: EOF
|
||
{% if unique_key %} | ||
{% if unique_key is sequence and unique_key is not string %} | ||
delete from {{target }} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
{{ target }}
{% macro trino__get_delete_insert_merge_sql(target, source, unique_key, dest_columns) -%} | ||
{%- set dest_cols_csv = get_quoted_csv(dest_columns | map(attribute="name")) -%} | ||
|
||
{% if unique_key %} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What if unique_key
is false? Probably some error handling would be required.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
unique_key
is not required. In that case it actually behaves like append
. This behavior is tested in following tests:
- test__empty_str_unique_key
- test__empty_unique_key_list
It just behaves exactly as in OOTB dbt (snowflake).
delete from {{ target }} | ||
where ( | ||
{{ unique_key }}) in ( | ||
select ({{ unique_key }}) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Parentheses not needed
dbt/include/trino/macros/materializations/incremental/merge.sql
Outdated
Show resolved
Hide resolved
tests/functional/adapter/materialization/test_incremental_delete_insert.py
Show resolved
Hide resolved
tests/functional/adapter/materialization/test_incremental_delete_insert.py
Show resolved
Hide resolved
tests/functional/adapter/materialization/test_incremental_delete_insert.py
Show resolved
Hide resolved
{#-- Not yet supported by trino (blocked in dbt_trino_validate_get_incremental_strategy macro) --#} | ||
{% do return(get_merge_sql(target_relation, tmp_relation, unique_key, dest_columns)) %} | ||
{% else %} | ||
{% do exceptions.raise_compiler_error('invalid strategy: ' ~ strategy) %} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks redundant if error handling for strategies is implemented in dbt_trino_validate_get_incremental_strategy
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM % some README comments and please update CHANGELOG and change commit message to reflect that delete+insert
strategy is added.
Let's wait for @findinpath review before merging.
README.md
Outdated
|
||
##### `delete+insert` | ||
|
||
Through the `delete+insert` incremental strategy, you can instruct dbt to use a two-step incremental approach. Note this strategy doesn't delete any records, it simply upserts the new and updated records. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can this be somehow rephrased? This strategy deletes records and insert new ones under the hood. So from the end user perspective it's upsert but from engine perspective it's not.
tests/functional/adapter/materialization/test_incremental_delete_insert.py
Show resolved
Hide resolved
delete+insert
materialization
{% do return(get_append_sql(target_relation, tmp_relation, dest_columns)) %} | ||
{% elif strategy == 'delete+insert' %} | ||
{% do return(get_delete_insert_merge_sql(target_relation, tmp_relation, unique_key, dest_columns)) %} | ||
{% elif strategy == 'merge' %} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i don't quite follow why do we have this branch now (as written, MERGE
is at the moment) not yet supported. Why not creating a follow-up PR when a Trino version with MERGE
support lands?
dbt/include/trino/macros/materializations/incremental/incremental.sql
Outdated
Show resolved
Hide resolved
insert into {{ target }} ({{ dest_cols_csv }}) | ||
( | ||
select {{ dest_cols_csv }} | ||
from {{ source }} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i'm wondering whether you missed
.include(database=true, schema=true)
like it is done for append
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is not required, it is actually already generated using database and schema. In the line after this one we can see:
drop table if exists {{ tmp_relation }};
If it would be necessary we should include it there too, otherwise the table would not be found.
Note that snowflake also doesn't do that.
tests/functional/adapter/materialization/test_incremental_delete_insert.py
Show resolved
Hide resolved
tests/functional/adapter/materialization/test_incremental_delete_insert.py
Outdated
Show resolved
Hide resolved
Please rebase your changes on |
{% endif %} | ||
{% endmacro %} | ||
|
||
{% macro get_append_sql(target_relation, tmp_relation, dest_columns) %} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We've discussed this already.
I see get_append_sql
and trino__get_delete_insert_merge_sql
which are namewise not consistent.
I'd be in favor of prefixing with trino__
until we get consistent macro names across all the codebase.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I suggest that the trino__
prefix should only be used for adapter macros that are dispatched in core dbt. The get_append_sql
macro is not part of core dbt while get_delete_insert_merge_sql
is. See https://github.com/dbt-labs/dbt-core/blob/1071a4681df91633301fdf23e34de819b66fbeb7/core/dbt/include/global_project/macros/materializations/models/incremental/merge.sql#L51-L53
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How do we know whether a certain macro is not going to be part of core dbt?
In any case keep this in mind for the eventual follow-up task of having consistent naming for the macros.
delete from {{ target }} | ||
where | ||
{% for key in unique_key %} | ||
{{ target }}.{{ key }} IN (SELECT {{ key }} FROM {{ source }}) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need to eventually take into consideration quoting the column names here?
If yes, could you please add a corresponding test case ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the unique_key is supplied by the user here. It's up to the user to supply a quoted or unquoted identifier.
delete from {{ target }} | ||
where | ||
{% for key in unique_key %} | ||
{{ target }}.{{ key }} IN (SELECT {{ key }} FROM {{ source }}) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
most of the statements here use lowercase for the reserved sql keywords.
pls change IN
SELECT and
FROM` accordingly.
Overview
Checklist
README.md
updated and added information about my changeCHANGELOG.md
updated and added information about my change