vsr: don't repair potentially unavailable ops #1579

matklad · 2024-02-20T18:27:54Z

op_repair_min tries to be smart and optimistic and repair more, which
sometimes lead to unavailability, due to two issues:

primary tries to repair more than backups, but because backups repair
less, those prepares might be unavailable! That is, we might suffer
more than f failures for particular prepare over time, because it
is not continuously repaired on the backups
after state sync, replicas simply do not have any prepares beyond
checkpoint.

Simplify op_repair_min logic to always repair from the latest
"confirmed" (committed on top of) checkpoint.

That is, replicas right at the trigger repair the entire log wrap, in
addition to having the checkpoint. This ensures smooth hand off of
durability: prepares stop being repaired only when there's a quorum of
replicas at a checkpoint including the prepare.

Closes: #1378

matklad · 2024-02-23T14:37:00Z

src/vsr/replica.zig

-                        // And we don't need op_checkpoint's prepare/header in our log if we crash.
-                        break :op self.op_checkpoint() + 1;
-                    }
+                    break :op (self.op_checkpoint() + 1) -| constants.vsr_checkpoint_interval;
                }
            };



Tried to add an extra assert here:

assert(self.prepare_max() - op < constants.journal_slot_count);

but that doesn't actually work for a funny reason:

it can be violated if your op_repair_min still uses the previous checkpoint, and prepare_max already picks the next one (that is, commit_max is right on the trigger).

but it actually isn't really violated --- when you receives that op_prepare_max, you advance your commit_max, which causes you to move op_repair_min by a checkpoint forward as well.

Ah, but wait, I can assert that on_prepare instead!

op_repair_min tries to be smart and optimistic and repair _more_, which sometimes lead to unavailability, due to two issues: - primary tries to repair more than backups, but because backups repair less, those prepares might be unavailable! That is, we might suffer _more_ than `f` failures for particular prepare over time, because it is not continuously repaired on the backups - after state sync, replicas simply do not have any prepares beyond checkpoint. Simplify op_repair_min logic to always repair from the latest "confirmed" (committed on top of) checkpoint. That is, replicas _right_ at the trigger repair the entire log wrap, in addition to having the checkpoint. This ensures smooth hand off of durability: prepares stop being repaired only when there's a quorum of replicas at a checkpoint including the prepare.

sentientwaffle

The code all LGTM!

sentientwaffle · 2024-02-23T15:01:29Z

src/vsr/replica.zig

-            // we are stuck waiting for a prepare or block which will never arrive.
+            // It is possible that we could in fact repair and progress anyway, but this is not
+            // guaranteed: the rest of the cluster no longer repairs the relevant prepares.
+            // We  might be stuck waiting for a prepare or block which will never arrive.


Suggested change

// We might be stuck waiting for a prepare or block which will never arrive.

// We might be stuck waiting for a prepare or block which will never arrive.

(Extra space.)

sentientwaffle · 2024-02-23T15:02:12Z

src/vsr/replica.zig

-        ///   - primaries do repair checkpointed ops so that they can help lagging backups catch up,
-        ///     as long as the op is new enough to be present in the WAL of some other replica and
-        ///     is from the previous checkpoint.
+        /// Safety condition: repairing an old op must not overwrite a newer op from the next  wrap.


Suggested change

/// Safety condition: repairing an old op must not overwrite a newer op from the next wrap.

/// Safety condition: repairing an old op must not overwrite a newer op from the next wrap.

Extra space

sentientwaffle · 2024-02-23T15:02:49Z

src/vsr/replica.zig

+        /// it in a quorum of checkpoints.
+        ///
+        /// If op=trigger+1 is committed, the corresponding checkpoint is confirmed to be present on
+        /// a quorum of replicas. Repairig all ops since the latest confirmed checkpoint satisfies


Suggested change

/// a quorum of replicas. Repairig all ops since the latest confirmed checkpoint satisfies

/// a quorum of replicas. Repairing all ops since the latest confirmed checkpoint satisfies

If op=op_checkpoint, but we also know that the checkpoint is durable, we don't need to repair the op, we can proceed directly to op+1. But repairs can't advance the op, so we special case this and "pretend" that we are rearing "op". As soon as we hear from view's primary, we'll advance our `op` past `op_checkpoint` and stop pretending that we want to repair `op`. This pre-existing logic that was erronously removed in #1579 Seed: 15816179864814849519 Closes: #1588

matklad · 2024-05-20T12:38:15Z

src/vsr/replica.zig

+        /// Availability condition: each committed op must be present either in a quorum of WALs or
+        /// it in a quorum of checkpoints.
+        ///
+        /// If op=trigger+1 is committed, the corresponding checkpoint is durably present on


Ah, but I think this is wrong after #1508!

The condition is now op=prepare_max + 1, not op=trigger + 1 I think!

matklad force-pushed the matklad/op-repair-min branch 4 times, most recently from 15c9d92 to b45c484 Compare February 23, 2024 13:15

matklad marked this pull request as ready for review February 23, 2024 13:18

matklad force-pushed the matklad/op-repair-min branch from b45c484 to f0ab527 Compare February 23, 2024 13:22

matklad assigned sentientwaffle Feb 23, 2024

matklad added vopr and removed vopr labels Feb 23, 2024

matklad force-pushed the matklad/op-repair-min branch 2 times, most recently from 24177ca to b400adf Compare February 23, 2024 14:27

matklad commented Feb 23, 2024

View reviewed changes

matklad force-pushed the matklad/op-repair-min branch from b400adf to 47e9be8 Compare February 23, 2024 14:39

matklad added the vopr label Feb 23, 2024

sentientwaffle reviewed Feb 23, 2024

View reviewed changes

matklad removed the vopr label Feb 23, 2024

vsr: use durable checkpoint terminology

1f39a39

matklad force-pushed the matklad/op-repair-min branch from 129cd26 to 1f39a39 Compare February 23, 2024 15:26

matklad enabled auto-merge February 23, 2024 15:26

sentientwaffle approved these changes Feb 23, 2024

View reviewed changes

matklad added this pull request to the merge queue Feb 23, 2024

Merged via the queue into main with commit 4a893ec Feb 23, 2024
27 checks passed

matklad deleted the matklad/op-repair-min branch February 23, 2024 15:52

matklad mentioned this pull request Feb 24, 2024

vsr: restore op_repair_min=op=op_checkpoint special case #1589

Merged

matklad commented May 20, 2024

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

vsr: don't repair potentially unavailable ops #1579

vsr: don't repair potentially unavailable ops #1579

matklad commented Feb 20, 2024 •

edited

Loading

matklad Feb 23, 2024

matklad Feb 23, 2024

sentientwaffle left a comment

sentientwaffle Feb 23, 2024

sentientwaffle Feb 23, 2024

sentientwaffle Feb 23, 2024

matklad May 20, 2024

	// We might be stuck waiting for a prepare or block which will never arrive.
	// We might be stuck waiting for a prepare or block which will never arrive.

	/// Safety condition: repairing an old op must not overwrite a newer op from the next wrap.
	/// Safety condition: repairing an old op must not overwrite a newer op from the next wrap.

	/// a quorum of replicas. Repairig all ops since the latest confirmed checkpoint satisfies
	/// a quorum of replicas. Repairing all ops since the latest confirmed checkpoint satisfies

vsr: don't repair potentially unavailable ops #1579

vsr: don't repair potentially unavailable ops #1579

Conversation

matklad commented Feb 20, 2024 • edited Loading

matklad Feb 23, 2024

Choose a reason for hiding this comment

matklad Feb 23, 2024

Choose a reason for hiding this comment

sentientwaffle left a comment

Choose a reason for hiding this comment

sentientwaffle Feb 23, 2024

Choose a reason for hiding this comment

sentientwaffle Feb 23, 2024

Choose a reason for hiding this comment

sentientwaffle Feb 23, 2024

Choose a reason for hiding this comment

matklad May 20, 2024

Choose a reason for hiding this comment

matklad commented Feb 20, 2024 •

edited

Loading