Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Umbrella] CDC DDL Sync Design(Zeta) #7930

Open
5 of 7 tasks
Tracked by #2272
hailin0 opened this issue Oct 28, 2024 · 1 comment
Open
5 of 7 tasks
Tracked by #2272

[Umbrella] CDC DDL Sync Design(Zeta) #7930

hailin0 opened this issue Oct 28, 2024 · 1 comment

Comments

@hailin0
Copy link
Member

hailin0 commented Oct 28, 2024

Code of Conduct

Search before asking

  • I had searched in the issues and found no similar issues.

Describe the proposal

Backgroud

Currently, we have support for data change capture(CDC #3175), but no further design for schema evolution.

  • MySQL
  • Oracle
  • Postgres
  • SQlServer
  • ...

And as CDC data synchronization, I think we need to support schema evolution(DDL) as a feature, and I want to hear from you all how you think it can be implemented in SeaTunnel.

Motivation

  • Support read database ddl sql (mysql/oracle/...)
  • Support ddl sync to any sink(Standardized parse ddl to event)
  • Support pause and resume ddl at any time
  • Support automatic processing of old data and new data switche

Overall Design

Basic flow

image

  • Depends on checkpoint to push ddl: reader -> writer -> committer

Phase1 - Before Change

image

  • Data flow mixed with structure flow
  • DDL is bound to database syntax

Phase2 - Starting Change

image

  • Parsed ddl into schema events

Phase3 - Splitting data flow and structure flow

image

  • Insert checkpoint signals before and after ddl
    • schema-change-before signals
    • schema-change-after signals

Phase4 - Handling schema-change-before signal

image

  • Stop source read data
  • Waiting for all old data refresh to sink storage

Phase5 - Execute ddl into source & sink

image

  • Stop checkpoint manager

Phase6 - Handling schema-change-after signal

image

Phase7 - Completed

image

  • Restart checkpoint manager
  • Restart read data for source

Task list

Are you willing to submit PR?

  • Yes I am willing to submit a PR!
@hailin0
Copy link
Member Author

hailin0 commented Oct 28, 2024

link #3175

@hailin0 hailin0 changed the title [Umbrella] CDC DDL Design [Umbrella] CDC DDL Sync Design Oct 28, 2024
@hailin0 hailin0 changed the title [Umbrella] CDC DDL Sync Design [Umbrella] CDC DDL Sync Design(Zeta) Oct 28, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant