Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Delta Lake connector writes incorrect CDC entries when deletion vector is enabled #23620

Closed
ebyhr opened this issue Oct 1, 2024 · 1 comment · Fixed by #23827
Closed

Delta Lake connector writes incorrect CDC entries when deletion vector is enabled #23620

ebyhr opened this issue Oct 1, 2024 · 1 comment · Fixed by #23827
Assignees
Labels
bug Something isn't working delta-lake Delta Lake connector

Comments

@ebyhr
Copy link
Member

ebyhr commented Oct 1, 2024

Spark returns the expected results:

CREATE TABLE default.test_delta
(col1 STRING, updated_column INT)
USING DELTA
LOCATION 's3://test-bucket/test_delta'
TBLPROPERTIES ('delta.enableChangeDataFeed'=true, 'delta.enableDeletionVectors'=true);

INSERT INTO default.test_delta VALUES ('testValue1', 1), ('testValue2', 2), ('testValue3', 3);
UPDATE default.test_delta SET updated_column = 30 WHERE col1 = 'testValue3';

SELECT col1, updated_column, _change_type, _commit_version FROM table_changes('default.test_delta', 0);
testValue3	3	update_preimage	2
testValue3	30	update_postimage	2
testValue1	1	insert	1
testValue2	2	insert	1
testValue3	3	insert	1

update_preimage is missing if we update on Trino:

CREATE TABLE default.test_trino
(col1 STRING, updated_column INT)
USING DELTA
LOCATION 's3://test-bucket/test_trino'
TBLPROPERTIES ('delta.enableChangeDataFeed'=true, 'delta.enableDeletionVectors'=true);

INSERT INTO default.test_trino VALUES ('testValue1', 1), ('testValue2', 2), ('testValue3', 3);
UPDATE default.test_trino SET updated_column = 30 WHERE col1 = 'testValue3';

SELECT col1, updated_column, _change_type, _commit_version FROM table_changes('default.test_trino', 0);
testValue3	30	update_postimage	2
testValue1	1	insert	1
testValue2	2	insert	1
testValue3	3	insert	1

test_delta.zip
test_trino.zip

We should disallow write operations when both CDC and DV is enabled, until we fix the issue.

@ebyhr ebyhr added bug Something isn't working delta-lake Delta Lake connector labels Oct 1, 2024
@ebyhr
Copy link
Member Author

ebyhr commented Oct 3, 2024

@homar Can you take a look at this issue when you have time?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working delta-lake Delta Lake connector
Development

Successfully merging a pull request may close this issue.

2 participants