-
Notifications
You must be signed in to change notification settings - Fork 44
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Bifrost] RepairTail task for replicated loglet #2046
Conversation
037e92b
to
351371b
Compare
e11e1f9
to
1392f17
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great work @AhmedSoliman 🚀 The changes look really good and the logic seems sound to me. +1 for merging :-)
crates/bifrost/src/providers/replicated_loglet/read_path/read_stream_task.rs
Outdated
Show resolved
Hide resolved
crates/bifrost/src/providers/replicated_loglet/tasks/digests.rs
Outdated
Show resolved
Hide resolved
crates/bifrost/src/providers/replicated_loglet/tasks/digests.rs
Outdated
Show resolved
Hide resolved
// We run stores as tasks because we'll wait only for the necessary write-quorum but the | ||
// rest of the stores can continue in the background as best-effort replication (if the | ||
// spread selector strategy picked extra nodes) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unrelated: Can it become a problem if we accumulate network send tasks that are awaiting a response which won't come because the other node has died? I don't expect this to happen often but over time it could result into a memory leak.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The task will give up once it exhausts our rpc retry policy (which is finite by default) but the risk exists if someone changed this to infinite retries.
crates/bifrost/src/providers/replicated_loglet/tasks/digests.rs
Outdated
Show resolved
Hide resolved
/// 1. Log-servers persisting the last known_global_tail periodically/async and using this value as known_global_tail on startup. | ||
/// 2. Sequencer-driven seal. If the sequencer is alive, it can send a special value with the seal message to | ||
/// indicate what is the ultimate known-global-tail that nodes should repair to instead of relying on the observed max-tail. | ||
/// 3. Limit `from_offset` to repair from to max(min(local_tails), max(known_global_tails), known_archived, trim_point) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why could we take the min of local tails? Wouldn't this run the risk to lose previously committed records?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This implies that those tails are from f-majority of nodes. If we have responses of f-majority of log-servers.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe you can explain this optimization to me once we have a bit more time after the demo. I can't wrap my head around it yet. The only thing I can think of is that we can skip those entries where we can reliably say that even with the missing nodes, there can't be a write quorum.
crates/bifrost/src/providers/replicated_loglet/tasks/repair_tail.rs
Outdated
Show resolved
Hide resolved
This puts together the design and implementation of the tail repair procedure that's required when FindTail cannot establish a consistent durable tail from log-servers. The details are described as comments in code.
This puts together the design and implementation of the tail repair procedure that's required when FindTail cannot establish a consistent durable tail from log-servers. The details are described as comments in code.
Stack created with Sapling. Best reviewed with ReviewStack.