-
Notifications
You must be signed in to change notification settings - Fork 15
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Explicitly store last prepare steps for replay. #1253
Conversation
I did this because it turned out to be (basically) necessary for both the ping-pong implementation, as well as the race-condition fixes. This also fixes some buggy behavior in the corners of replay. For example, if a round which ended up dropping a report was replayed, the Helper would respond differently on replay: no prepare step at all the first time, a failed prepare step with error ReportDropped on replay. [There may be more; I didn't do a close reading of every code path.]
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the only blocker is creating a database migration for the last_prep_step
column, and whatever the test failure on stable is.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this change be a schema migration? Or perhaps the relevant question is: do we want to deploy this change into staging-dap-04
? If so, then I think we should do a migration script because otherwise we'd have to re-deploy that environment's database from scratch.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure -- I think we could technically get away with just flattening/redeploying the DB since we have not yet actually used it. But it's good to test our migration implementation, too, and I suppose since we've released we should follow the DB migration strategy.
But I couldn't find any documentation on how to write a migration. Presumably the goal is to create a pair of up
/down
files in the proper lexicographic order with the other migration files; but do we use sqlx
to create these files somehow, or create them manually? is "filename lexicographic order" the correct way to determine order of migrations? how do we determine which DB versions work with which Janus versions? IMO, we should write a few sentences documenting this when time permits.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I created a migration for this. One more thing it would be good to define: what determines the expected ordering of the schema update with the Janus software update (or any other relevant deployment operations)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Discussion of creating new migrations is here: https://github.com/divviup/janus/blob/main/docs/DEPLOYING.md#database
And discussion of how to do this operationally (i.e. sequencing of code deploy vs. schema migration) is being drafted in https://github.com/divviup/janus-ops/pull/663
I'm going to implement Janus checking whether it supports the current schema version in #1241
@@ -4453,12 +4470,26 @@ pub mod models { | |||
self.ord | |||
} | |||
|
|||
/// Returns the last preparation step returned by the Helper, if any. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is "by the Helper" always accurate? Wouldn't a helper implementation use this to store the leader's step?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The Helper always uses this field to store the last preparation step it returned to the Leader, to allow replay to occur. (during processing of an aggregation init/continue message, the Helper updates its in-memory "last" prep step to be what it will eventually return, then builds its response message based on the "last" prep steps -- the intent is to ensure as strongly as possible that what is returned on the initial response will match exactly what is returned on a replay. this implementation strategy also makes it somewhat easier to write code that repeatedly processes each report aggregation while allowing each processing step to update the returned preparation step) The Helper doesn't need to store the Leader's last step.
(and, slightly off-topic from the question, the Leader always stores None in this field.)
I did this because it turned out to be (basically) necessary for both the ping-pong implementation, as well as the race-condition fixes.
This also fixes some buggy behavior in the corners of replay. For example, if a round which ended up dropping a report was replayed, the Helper would respond differently on replay: no prepare step at all the first time, a failed prepare step with error ReportDropped on replay. [There may be more; I didn't do a close reading of every code path.]