Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DELETE Statement Deleting Another Record #11212

Closed
Amar1404 opened this issue May 14, 2024 · 6 comments
Closed

DELETE Statement Deleting Another Record #11212

Amar1404 opened this issue May 14, 2024 · 6 comments
Labels
feature-enquiry issue contains feature enquiries/requests or great improvement ideas on-call-triaged writer-core Issues relating to core transactions/write actions

Comments

@Amar1404
Copy link
Contributor

Tips before filing an issue

  • Have you gone through our FAQs?

  • Join the mailing list to engage in conversations and get faster support at dev-subscribe@hudi.apache.org.

  • If you have triaged this as a bug, then file an issue directly.

Describe the problem you faced
I have duplicated keys in hudi table due to the insert statement, when I tried deleting the key based on a different filter both the keys were deleted

A clear and concise description of the problem.

To Reproduce

Steps to reproduce the behavior:

  1. Create a table using Insert two records with the same key on without partition table.
  2. Try to delete the record of the key in only one row by using key and _hoodie_commit_seqno
  3. now check the table the table will delete both the record

Expected behavior

The delete command should only delete the one row which was used for filtering

Environment Description

  • Hudi version : 0.12.3

  • Spark version : 3.3

  • Hive version : 3

  • Hadoop version :

  • Storage (HDFS/S3/GCS..) : s3

  • Running on Docker? (yes/no) : no

Additional context

Add any other context about the problem here.

Stacktrace

Add the stacktrace of the error.

@ad1happy2go
Copy link
Collaborator

@Amar1404 Can you please try 0.14.1. This was fixed. I tried below code also to demonstrate -

DROP TABLE issue_11212;
set hoodie.spark.sql.insert.into.operation=bulk_insert;
CREATE TABLE issue_11212 (
    ts BIGINT,
    uuid STRING,
    rider STRING,
    driver STRING,
    fare DOUBLE,
    city STRING
) USING HUDI
OPTIONS(
  'hoodie.datasource.write.recordkey.field'='uuid',
  'hoodie.datasource.write.precombine.field'='ts',
  'hoodie.datasource.write.operation'='bulk_insert'
);

INSERT INTO issue_11212
VALUES
(1695159649087,'334e26e9-8355-45cc-97c6-c31daf0df330','rider-A','driver-K',19.10,'san_francisco');

INSERT INTO issue_11212
VALUES
(1695159649087,'334e26e9-8355-45cc-97c6-c31daf0df330','rider-C','driver-L',19.10,'san_francisco');

select * from issue_11212 where uuid = '334e26e9-8355-45cc-97c6-c31daf0df330';

SELECT * FROM issue_11212 WHERE uuid = '334e26e9-8355-45cc-97c6-c31daf0df330' and _hoodie_commit_seqno = '<seq no>';

DELETE FROM issue_11212 WHERE uuid = '334e26e9-8355-45cc-97c6-c31daf0df330' and _hoodie_commit_seqno = '<seq no>'

select * from issue_11212 where uuid = '334e26e9-8355-45cc-97c6-c31daf0df330';

Can you please check above and let us know.

@Amar1404
Copy link
Contributor Author

@ad1happy2go - Is there any other way to do it on hudi 0.12.3 like I am trying to use config hoodie.combine.before.delete setting it as false, or any other config

@Amar1404
Copy link
Contributor Author

@ad1happy2go - Do you know any other way to delete duplicated record from the hudi table without rewriting whole table

@ad1happy2go
Copy link
Collaborator

@Amar1404 With 0.12 we always used to delete records based on record key. That is the reason both of those records are getting filtered out.
One way is to identify duplicated records from the table and then perform delete and insert.

@codope codope added feature-enquiry issue contains feature enquiries/requests or great improvement ideas writer-core Issues relating to core transactions/write actions on-call-triaged labels May 15, 2024
@ad1happy2go
Copy link
Collaborator

@Amar1404 Did the approach worked? Do you need any other help here?

@Amar1404
Copy link
Contributor Author

@ad1happy2go - that approach worked thanks.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature-enquiry issue contains feature enquiries/requests or great improvement ideas on-call-triaged writer-core Issues relating to core transactions/write actions
Projects
Archived in project
Development

No branches or pull requests

3 participants