-
Notifications
You must be signed in to change notification settings - Fork 530
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
vsr: don't repair potentially unavailable ops #1579
Conversation
15c9d92
to
b45c484
Compare
b45c484
to
f0ab527
Compare
24177ca
to
b400adf
Compare
// And we don't need op_checkpoint's prepare/header in our log if we crash. | ||
break :op self.op_checkpoint() + 1; | ||
} | ||
break :op (self.op_checkpoint() + 1) -| constants.vsr_checkpoint_interval; | ||
} | ||
}; | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Tried to add an extra assert here:
assert(self.prepare_max() - op < constants.journal_slot_count);
but that doesn't actually work for a funny reason:
- it can be violated if your
op_repair_min
still uses the previous checkpoint, andprepare_max
already picks the next one (that is, commit_max is right on the trigger). - but it actually isn't really violated --- when you receives that
op_prepare_max
, you advance yourcommit_max
, which causes you to move op_repair_min by a checkpoint forward as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, but wait, I can assert that on_prepare
instead!
op_repair_min tries to be smart and optimistic and repair _more_, which sometimes lead to unavailability, due to two issues: - primary tries to repair more than backups, but because backups repair less, those prepares might be unavailable! That is, we might suffer _more_ than `f` failures for particular prepare over time, because it is not continuously repaired on the backups - after state sync, replicas simply do not have any prepares beyond checkpoint. Simplify op_repair_min logic to always repair from the latest "confirmed" (committed on top of) checkpoint. That is, replicas _right_ at the trigger repair the entire log wrap, in addition to having the checkpoint. This ensures smooth hand off of durability: prepares stop being repaired only when there's a quorum of replicas at a checkpoint including the prepare.
b400adf
to
47e9be8
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The code all LGTM!
src/vsr/replica.zig
Outdated
// we are stuck waiting for a prepare or block which will never arrive. | ||
// It is possible that we could in fact repair and progress anyway, but this is not | ||
// guaranteed: the rest of the cluster no longer repairs the relevant prepares. | ||
// We might be stuck waiting for a prepare or block which will never arrive. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
// We might be stuck waiting for a prepare or block which will never arrive. | |
// We might be stuck waiting for a prepare or block which will never arrive. |
(Extra space.)
src/vsr/replica.zig
Outdated
/// - primaries do repair checkpointed ops so that they can help lagging backups catch up, | ||
/// as long as the op is new enough to be present in the WAL of some other replica and | ||
/// is from the previous checkpoint. | ||
/// Safety condition: repairing an old op must not overwrite a newer op from the next wrap. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/// Safety condition: repairing an old op must not overwrite a newer op from the next wrap. | |
/// Safety condition: repairing an old op must not overwrite a newer op from the next wrap. |
Extra space
src/vsr/replica.zig
Outdated
/// it in a quorum of checkpoints. | ||
/// | ||
/// If op=trigger+1 is committed, the corresponding checkpoint is confirmed to be present on | ||
/// a quorum of replicas. Repairig all ops since the latest confirmed checkpoint satisfies |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/// a quorum of replicas. Repairig all ops since the latest confirmed checkpoint satisfies | |
/// a quorum of replicas. Repairing all ops since the latest confirmed checkpoint satisfies |
129cd26
to
1f39a39
Compare
If op=op_checkpoint, but we also know that the checkpoint is durable, we don't need to repair the op, we can proceed directly to op+1. But repairs can't advance the op, so we special case this and "pretend" that we are rearing "op". As soon as we hear from view's primary, we'll advance our `op` past `op_checkpoint` and stop pretending that we want to repair `op`. This pre-existing logic that was erronously removed in #1579 Seed: 15816179864814849519 Closes: #1588
/// Availability condition: each committed op must be present either in a quorum of WALs or | ||
/// it in a quorum of checkpoints. | ||
/// | ||
/// If op=trigger+1 is committed, the corresponding checkpoint is durably present on |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, but I think this is wrong after #1508!
The condition is now op=prepare_max + 1
, not op=trigger + 1
I think!
op_repair_min tries to be smart and optimistic and repair more, which
sometimes lead to unavailability, due to two issues:
less, those prepares might be unavailable! That is, we might suffer
more than
f
failures for particular prepare over time, because itis not continuously repaired on the backups
checkpoint.
Simplify op_repair_min logic to always repair from the latest
"confirmed" (committed on top of) checkpoint.
That is, replicas right at the trigger repair the entire log wrap, in
addition to having the checkpoint. This ensures smooth hand off of
durability: prepares stop being repaired only when there's a quorum of
replicas at a checkpoint including the prepare.
Closes: #1378