-
Notifications
You must be signed in to change notification settings - Fork 44
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Bifrost] Fix seal correctness on sequencers #2094
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -172,6 +172,9 @@ impl<T: TransportConnect> Loglet for ReplicatedLoglet<T> { | |
} | ||
|
||
async fn enqueue_batch(&self, payloads: Arc<[Record]>) -> Result<LogletCommit, OperationError> { | ||
if self.known_global_tail().is_sealed() { | ||
return Ok(LogletCommit::sealed()); | ||
} | ||
metrics::counter!(BIFROST_RECORDS_ENQUEUED_TOTAL).increment(payloads.len() as u64); | ||
metrics::counter!(BIFROST_RECORDS_ENQUEUED_BYTES).increment( | ||
payloads | ||
|
@@ -205,10 +208,8 @@ impl<T: TransportConnect> Loglet for ReplicatedLoglet<T> { | |
.await?; | ||
if result == CheckSealOutcome::Sealing { | ||
// We are likely to be sealing... | ||
// let's fire a seal to ensure this seal is complete | ||
if self.seal().await.is_ok() { | ||
self.known_global_tail.notify_seal(); | ||
} | ||
// let's fire a seal to ensure this seal is complete. | ||
self.seal().await?; | ||
} | ||
return Ok(*self.known_global_tail.get()); | ||
} | ||
|
@@ -251,7 +252,6 @@ impl<T: TransportConnect> Loglet for ReplicatedLoglet<T> { | |
} | ||
|
||
async fn seal(&self) -> Result<(), OperationError> { | ||
// todo(asoli): If we are the sequencer node, let the sequencer know. | ||
let _ = SealTask::new( | ||
task_center(), | ||
self.my_params.clone(), | ||
|
@@ -260,6 +260,15 @@ impl<T: TransportConnect> Loglet for ReplicatedLoglet<T> { | |
) | ||
.run(self.networking.clone()) | ||
.await?; | ||
// If we are the sequencer, we need to wait until the sequencer is drained. | ||
if let SequencerAccess::Local { handle } = &self.sequencer { | ||
handle.drain().await?; | ||
self.known_global_tail.notify_seal(); | ||
}; | ||
// On remote sequencer, we only set our global tail to sealed when we call find_tail and it | ||
// returns Sealed. We should NOT: | ||
// - Use AppendError::Sealed to mark our sealed global_tail | ||
// - Mark our global tail as sealed on successful seal() call. | ||
Comment on lines
+270
to
+271
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Why would these two cases be problematic? Is it because the sequencer returns an There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yes. Rejecting appends with Sealed doesn't automatically mean that the current value of known_global_tail is safe to be considered sealed. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The sequencer will return its last known global commit offset, this doesn't mean that the sequencer node itself has set its own seal bit so it can still drift forward. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Note that the sequencer might send AppendError::Sealed during drain, that doesn't mean that it's knowledge of global tail should be used as the reliable "seal" tail. |
||
info!(loglet_id=%self.my_params.loglet_id, "Loglet has been sealed successfully"); | ||
Ok(()) | ||
} | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is it important to drain before sending the seal notification?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What we don't want is:
This leads to a find_tail that conveys the picture as if no appends will be acknowledged after this offset, that's the critical guarantee that we must maintain.