Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

--full-refresh brakes incremental inserts_only=True replicated model #117

Closed
tema-popov opened this issue Nov 29, 2022 · 3 comments · Fixed by #120
Closed

--full-refresh brakes incremental inserts_only=True replicated model #117

tema-popov opened this issue Nov 29, 2022 · 3 comments · Fixed by #120

Comments

@tema-popov
Copy link

  1. Create an incremental materialized table that uses ReplicatedMergeThree engine. I see these issue on scheme with two replicas.
{{
  config(
    materialized = "incremental",
    inserts_only = true,
    engine = "ReplicatedReplacingMergeTree('/clickhouse/tables/{layer}-{shard}/{database}/{table}/{uuid}', '{replica}')",
  )
}}
  1. dbt run with empty tables works correctly
  2. incremental dbt run also works as intended
  3. dbt run --full-refresh creates inconsistent tables (different replicas respond with different results)

If I check individual table schemas, I see that engines differ:

replica 1: ReplicatedReplacingMergeTree('/clickhouse/tables/{layer}-{shard}/db_name/table_name__dbt_tmp/{uuid}', '{replica}')
replica 2: `ReplicatedReplacingMergeTree('/clickhouse/tables/{layer}-{shard}/db_name/table_name/{uuid}', '{replica}')

I would expect that --full-refresh would drop tables on the cluster and run the query as there are no tables created yet.

dbt-clickhouse           1.3.1
dbt-core                 1.3.1

Looks like it is something similar to #95 as I can't use standard materialization mode with Replication due to tmp tables manipulations affect replication settings.

@genzgd
Copy link
Contributor

genzgd commented Nov 29, 2022

Again replicated tables are not officially supported and in particular, aren't included in the test suite. Have you tried using the experimental Replicated database engine in this situation?

Regardless, the issue may be with the fact that the EXCHANGE TABLES macro does not include an ON CLUSTER clause. Is it possible for you to modify that macro and test? https://github.com/ClickHouse/dbt-clickhouse/blob/main/dbt/include/clickhouse/macros/adapters.sql#L117-L122

Your proposed solution of dropping the tables first on cluster seems like a somewhat dangerous change in case of an failure while executing the query. In that case the original tables will be lost and not recoverable if there is some issue with the full refresh.

@tema-popov
Copy link
Author

Regardless, the issue may be with the fact that the EXCHANGE TABLES macro does not include an ON CLUSTER clause. Is it possible for you to modify that macro and test? https://github.com/ClickHouse/dbt-clickhouse/blob/main/dbt/include/clickhouse/macros/adapters.sql#L117-L122

Yeah, looks like a good hint, I can test it with fix a little bit later, thanks!

Again replicated tables are not officially supported and in particular, aren't included in the test suite. Have you tried using the experimental Replicated database engine in this situation?

It is okay, I'm ready to be a early adopter :)

@tema-popov
Copy link
Author

great, thanks @saurabhbikram for taking care of it, unfortunately, I didn't find time to fix and test in myself.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants