Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: support MERGE #10174

Closed
BohuTANG opened this issue Feb 23, 2023 · 5 comments
Closed

feat: support MERGE #10174

BohuTANG opened this issue Feb 23, 2023 · 5 comments
Assignees
Labels
C-feature Category: feature

Comments

@BohuTANG
Copy link
Member

BohuTANG commented Feb 23, 2023

Summary

With MERGE, we can merge a source table into a target table by conditionally insert, update, or delete rows in batch, this is the best way to change data capture (CDC) big data.

MERGE INTO <target_table> USING <source> ON <join_expr> { matchedClause | notMatchedClause } [ ... ]

matchedClause ::=
  WHEN MATCHED [ AND <case_predicate> ] THEN { UPDATE SET <col_name> = <expr> [ , <col_name2> = <expr2> ... ] | DELETE } [ ... ]

notMatchedClause ::=
   WHEN NOT MATCHED [ AND <case_predicate> ] THEN INSERT [ ( <col_name> [ , ... ] ) ] VALUES ( <expr> [ , ... ] )

Example

Target Table

CREATE TABLE account
 (
     id           INT,
     name         VARCHAR,
     last_updated DATETIME
 );

INSERT INTO account VALUES     (1, 'alice', '2023-01-01');
INSERT INTO account VALUES     (2, 'bob',  '2023-01-01');
INSERT INTO account VALUES     (3, 'charlie',  '2023-01-02'); 

image

Stage Table

CREATE TABLE account_stage
  (
     id           INT,
     NAME         VARCHAR,
     last_updated DATETIME,
     operation    VARCHAR
  );

INSERT INTO account_stage VALUES     (1, 'alice', '2023-02-22', 'U');
INSERT INTO account_stage VALUES     (1, 'alice', '2023-02-23', 'D');
INSERT INTO account_stage VALUES     (2, 'bob', '2023-02-21', 'U');
INSERT INTO account_stage VALUES     (2, 'bob', '2023-02-23', 'U');
INSERT INTO account_stage VALUES     (4, 'dell',  '2022-02-23',  'I'); 

image

Merge Stage to Target

MERGE INTO account
using (SELECT id, NAME, last_updated,operation FROM   account_stage) AS cdc
ON account.id = cdc.id

WHEN matched AND cdc.operation='U' THEN
  UPDATE SET NAME = cdc.NAME, last_updated = cdc.last_updated
WHEN matched AND cdc.operation='D' THEN
  DELETE
WHEN NOT matched AND cdc.operation='I' THEN
  INSERT VALUES(cdc.id, cdc.NAME, cdc.last_updated) 

image

References:
[1] PostgreSQL: https://www.postgresql.org/docs/current/sql-merge.html
[2] Snowflake: https://docs.snowflake.com/en/sql-reference/sql/merge

@BohuTANG BohuTANG added the C-feature Category: feature label Feb 23, 2023
@BohuTANG
Copy link
Member Author

@leiysky @andylokandy : SQL parser and Planner need your help and comments
@dantengsky : storage needs your help

Thanks.

@BohuTANG BohuTANG mentioned this issue Feb 23, 2023
9 tasks
@flaneur2020
Copy link
Member

there's a JOIN in the MERGE statement, does it need an integration with CBO?

(in most of the time the right hand side is not very big, however MERGE also support the merging of two full tables totally, it can be viewed as a materializing the joined result back to the original table in a batch manner.

@BohuTANG
Copy link
Member Author

BohuTANG commented Feb 23, 2023

there's a JOIN in the MERGE statement, does it need an integration with CBO?

(in most of the time the right hand side is not very big, however MERGE also support the merging of two full tables totally, it can be viewed as a materializing the joined result back to the original table in a batch manner.

Merge is a special logic plan(DML), it's not a join query as normal, the USING .. ON is a constraint not a really JOIN, no CBO here.

@JackTan25
Copy link
Contributor

/assign me

@BohuTANG
Copy link
Member Author

Done

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
C-feature Category: feature
Projects
None yet
Development

No branches or pull requests

3 participants