Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[HUDI-6758] Fixing deducing spurious log blocks due to spark retries #9611
[HUDI-6758] Fixing deducing spurious log blocks due to spark retries #9611
Changes from all commits
f4a02a8
be48de1
b613c16
9e4b6f8
375c15d
File filter
Filter by extension
Conversations
Jump to
There are no files selected for viewing
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In Flink we already have
getAttemptIdSupplier
which serves as the same purpose, what's the usage of Spark then? should we usegetAttemptIdSupplier
instead ?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
as of now, I have disabled it for flink. can you test it out and fix it.
here is the situation w/ spark.
Lets say we want to spin up 10 tasks for a stage.
in first attempt,
each task will be assigned numbers from 1 to 10 for attemptId. but for attempNumber, it will be 0.
and when a subset of tasks are retried, new attemptIds will be 11, 12 etc. but attemptNumber will be 1.
This is how it works in spark. I am not very sure on flink. Anyways, w/ latest commit, we are avoiding writing the block identifier flink.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So, in summary, attemptId supplier is not the right one to use atleast in spark. It has to be attempt number.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does spark retry always takes the same data set? Is is possible one retried task goes to other executor/container and takes a different input dataset there?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
wrt log files, it should not, bcoz, only updates go to log files.
incase of bucket Index, anyways, record key to fileId is hash based and we should be good.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Only true for spark, so you are fixing a bug dependent on Spark write christeristic.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes. we have disabled it for flink as of now. In java, anyways, MOR is not fully functional from what I know. but I am open to disabling it for java as well. mainly its an issue for ExpressionPayload and any other custom payloads. most of the other payloads are idempotent even if there are duplicate log blocks.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's make the Flink impl right first by using
this.flinkRuntimeContext.getAttemptNumber()
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sure, will sync up w/ you directly to better understand this.
https://issues.apache.org/jira/browse/HUDI-6844