-
Notifications
You must be signed in to change notification settings - Fork 15
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Don't aggregate collected time intervals #1180
Closed
Closed
Changes from all commits
Commits
File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this join is probably fine from a performance perspective.
From an implementation perspective, I think we can drop this particular clause (checking for START & unleased), if we want; I don't think we care that much about this relatively small window of time.
However, I don't think this solves a race condition. Specifically, I think the following is still possible with this PR:
I think creating such an aggregation job is what we are trying to avoid.
I think a working solution might be something like: update the aggregation job driver to not drive (i.e. drop from its requests) any reports which have a conflicting collection job. Alternatively, just don't include such reports in the accumulator, though this might be harder to coordinate between the two aggregators.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point about the race condition. If we move this check to the top of stepping an aggregation job, then the downside is that the aggregation job creator would continue scheduling potentially un-runnable aggregation jobs, but they can be cheaply abandoned by the aggregation job driver.
So in the interests of reducing complexity, I think we should put this check in just one place, and have that be
aggregation_job_driver
.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this approach has a race condition too. Specifically:
I'm going to think about this further, but I wanted to point out the flaw in this approach before any implementation work happens.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The collection job acquisition query would not pick up a collection job if it sees a related aggregation job that is in progress. If we stipulate that the collection job acquisition query's transaction takes a long time to run, and the aggregation job from the first bullet point got created and advanced after the start of the collection job driver's transaction, then we could have this same issue thanks to a phantom read anomaly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What about the following checks:
I think this works because: due to the way Repeatable Read works, one of these jobs must be committed first. The check on the other job will then keep it from creating a race condition.
Note that the aggregation job in particular "will not drive reports" rather than "will abandon the whole aggregation job" -- aggregation jobs are no longer necessarily within a single time interval/batch. We can't abandon the whole aggregation job or we'll potentially lose reports in other time intervals.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I suspect that to make useful progress here, we need to clarify our state machine(s) for reports, aggregation jobs, leader collection jobs and helper aggregate share jobs. Then we can understand the synchronization points and figure out how to implement them properly. So I think this change should go on the backburner as I want to focus on production issues more immediately.