Explicitly store last prepare steps for replay. #1253

branlwyd · 2023-04-17T04:27:44Z

I did this because it turned out to be (basically) necessary for both the ping-pong implementation, as well as the race-condition fixes.

This also fixes some buggy behavior in the corners of replay. For example, if a round which ended up dropping a report was replayed, the Helper would respond differently on replay: no prepare step at all the first time, a failed prepare step with error ReportDropped on replay. [There may be more; I didn't do a close reading of every code path.]

I did this because it turned out to be (basically) necessary for both the ping-pong implementation, as well as the race-condition fixes. This also fixes some buggy behavior in the corners of replay. For example, if a round which ended up dropping a report was replayed, the Helper would respond differently on replay: no prepare step at all the first time, a failed prepare step with error ReportDropped on replay. [There may be more; I didn't do a close reading of every code path.]

tgeoghegan

I think the only blocker is creating a database migration for the last_prep_step column, and whatever the test failure on stable is.

tgeoghegan · 2023-04-17T15:29:03Z

db/20230405185602_initial-schema.up.sql

Should this change be a schema migration? Or perhaps the relevant question is: do we want to deploy this change into staging-dap-04? If so, then I think we should do a migration script because otherwise we'd have to re-deploy that environment's database from scratch.

Sure -- I think we could technically get away with just flattening/redeploying the DB since we have not yet actually used it. But it's good to test our migration implementation, too, and I suppose since we've released we should follow the DB migration strategy.

But I couldn't find any documentation on how to write a migration. Presumably the goal is to create a pair of up/down files in the proper lexicographic order with the other migration files; but do we use sqlx to create these files somehow, or create them manually? is "filename lexicographic order" the correct way to determine order of migrations? how do we determine which DB versions work with which Janus versions? IMO, we should write a few sentences documenting this when time permits.

I created a migration for this. One more thing it would be good to define: what determines the expected ordering of the schema update with the Janus software update (or any other relevant deployment operations)?

Discussion of creating new migrations is here: https://github.com/divviup/janus/blob/main/docs/DEPLOYING.md#database

And discussion of how to do this operationally (i.e. sequencing of code deploy vs. schema migration) is being drafted in https://github.com/divviup/janus-ops/pull/663

I'm going to implement Janus checking whether it supports the current schema version in #1241

tgeoghegan · 2023-04-17T15:33:35Z

aggregator_core/src/datastore.rs

@@ -4453,12 +4470,26 @@ pub mod models {
            self.ord
        }

+        /// Returns the last preparation step returned by the Helper, if any.


Is "by the Helper" always accurate? Wouldn't a helper implementation use this to store the leader's step?

The Helper always uses this field to store the last preparation step it returned to the Leader, to allow replay to occur. (during processing of an aggregation init/continue message, the Helper updates its in-memory "last" prep step to be what it will eventually return, then builds its response message based on the "last" prep steps -- the intent is to ensure as strongly as possible that what is returned on the initial response will match exactly what is returned on a replay. this implementation strategy also makes it somewhat easier to write code that repeatedly processes each report aggregation while allowing each processing step to update the returned preparation step) The Helper doesn't need to store the Leader's last step.

(and, slightly off-topic from the question, the Leader always stores None in this field.)

aggregator/src/aggregator/aggregation_job_driver.rs

aggregator/src/aggregator/aggregation_job_continue.rs

branlwyd requested a review from a team as a code owner April 17, 2023 04:27

Merge branch 'main' into bran/store-prep-steps

908ad0a

tgeoghegan requested changes Apr 17, 2023

View reviewed changes

branlwyd mentioned this pull request Apr 17, 2023

Fix race between aggregation & collection. #1254

Merged

branlwyd requested a review from tgeoghegan April 17, 2023 18:59

branlwyd added 2 commits April 17, 2023 12:00

Code review.

67b197b

Create DB migration for schema change.

6810c85

tgeoghegan approved these changes Apr 17, 2023

View reviewed changes

branlwyd merged commit 26a25be into main Apr 17, 2023

branlwyd deleted the bran/store-prep-steps branch April 17, 2023 21:45

tgeoghegan mentioned this pull request Apr 18, 2023

[experimental] Implement one-helper, "ping-pong" aggregation. #1234

Closed

divergentdave mentioned this pull request Apr 19, 2023

Undo comment change in old schema, document in new #1276

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Explicitly store last prepare steps for replay. #1253

Explicitly store last prepare steps for replay. #1253

branlwyd commented Apr 17, 2023

tgeoghegan left a comment •

edited

Loading

tgeoghegan Apr 17, 2023

branlwyd Apr 17, 2023

branlwyd Apr 17, 2023

tgeoghegan Apr 17, 2023

tgeoghegan Apr 17, 2023

branlwyd Apr 17, 2023 •

edited

Loading

Explicitly store last prepare steps for replay. #1253

Explicitly store last prepare steps for replay. #1253

Conversation

branlwyd commented Apr 17, 2023

tgeoghegan left a comment • edited Loading

Choose a reason for hiding this comment

tgeoghegan Apr 17, 2023

Choose a reason for hiding this comment

branlwyd Apr 17, 2023

Choose a reason for hiding this comment

branlwyd Apr 17, 2023

Choose a reason for hiding this comment

tgeoghegan Apr 17, 2023

Choose a reason for hiding this comment

tgeoghegan Apr 17, 2023

Choose a reason for hiding this comment

branlwyd Apr 17, 2023 • edited Loading

Choose a reason for hiding this comment

tgeoghegan left a comment •

edited

Loading

branlwyd Apr 17, 2023 •

edited

Loading