-
Notifications
You must be signed in to change notification settings - Fork 9.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
raft: fix correctness bug in CommittedEntries pagination #10063
Conversation
In etcd-io#9982, a mechanism to limit the size of `CommittedEntries` was introduced. The way this mechanism worked was that it would load applicable entries (passing the max size hint) and would emit a `HardState` whose commit index was truncated to match the limitation applied to the entries. Unfortunately, this was subtly incorrect when the user-provided `Entries` implementation didn't exactly match what Raft uses internally. Depending on whether a `Node` or a `RawNode` was used, this would either lead to regressing the HardState's commit index or outright forgetting to apply entries, respectively. Asking implementers to precisely match the Raft size limitation semantics was considered but looks like a bad idea as it puts correctness squarely in the hands of downstream users. Instead, this PR removes the truncation of `HardState` when limiting is active and tracks the applied index separately. This removes the old paradigm (that the previous code tried to work around) that the client will always apply all the way to the commit index, which isn't true when commit entries are paginated. See [1] for more on the discovery of this bug (CockroachDB's implementation of `Entries` returns one more entry than Raft's when the size limit hits). [1]: cockroachdb/cockroach#28918 (comment)
b844bdd
to
7a8ab37
Compare
Can you explain this more? |
@@ -381,13 +395,17 @@ func (n *node) run(r *raft) { | |||
if !IsEmptySnap(rd.Snapshot) { | |||
prevSnapi = rd.Snapshot.Metadata.Index | |||
} | |||
if index := rd.appliedCursor(); index != 0 { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
rd.CanApplyTo? The index returned here is the index the application can apply to rather than what the application has applied.
This might have a side effect for the raft application. Previously, the commit index will not be greater than the max index it gets from entries. Now the variants changes. Probably we need to document it around the entries pagination limit. |
There are ways to fix this and keep the invariant, but I think it's the wrong approach because it makes it much more important that the user implements exactly the size behavior that Raft wants (for example, in CockroachDB, there is a cache for the Raft entries and so there are various code paths for which it is burdensome to prove that they're all exactly the same - it's much nicer to have the size limitation as a "hint" that you can fulfill approximately). The applied index should be allowed to lag behind the commit index - I think that's how we would've done it had commit pagination been introduced earlier. It's possible that someone is relying on this behavior, but I'm not aware that this behavior was ever documented or "intentional". CockroachDB is completely oblivious to this change, for example. It's also better for fault tolerance to bump the commit index aggressively (even if there is a lot of log that needs to be applied still). That said, it's a change worth vetting closely. I'm definitely fixing two bugs here, but I don't want to introduce a new one. |
This works around the bug outlined in: etcd-io/etcd#10063 by matching Raft's internal implementation of commit pagination. Once the above PR lands, we can revert this commit (but I assume that it will take a little bit), and I think we should do that because the code hasn't gotten any nicer to look at. Fixes cockroachdb#28918. Release note: None
This works around the bug outlined in: etcd-io/etcd#10063 by matching Raft's internal implementation of commit pagination. Once the above PR lands, we can revert this commit (but I assume that it will take a little bit), and I think we should do that because the code hasn't gotten any nicer to look at. Fixes cockroachdb#28918. Release note: None
29579: storage: return one entry less in Entries r=petermattis a=tschottdorf This works around the bug outlined in: etcd-io/etcd#10063 by matching Raft's internal implementation of commit pagination. Once the above PR lands, we can revert this commit (but I assume that it will take a little bit), and I think we should do that because the code hasn't gotten any nicer to look at. Fixes #28918. Release note: None 29631: cli: handle merged range descriptors in debug keys r=petermattis a=tschottdorf Noticed during #29252. Release note: None Co-authored-by: Tobias Schottdorf <tobias.schottdorf@gmail.com>
This works around the bug outlined in: etcd-io/etcd#10063 by matching Raft's internal implementation of commit pagination. Once the above PR lands, we can revert this commit (but I assume that it will take a little bit), and I think we should do that because the code hasn't gotten any nicer to look at. Fixes cockroachdb#28918. Release note: None
@xiang90, do you have more comments? |
This works around the bug outlined in: etcd-io/etcd#10063 by matching Raft's internal implementation of commit pagination. Once the above PR lands, we can revert this commit (but I assume that it will take a little bit), and I think we should do that because the code hasn't gotten any nicer to look at. Fixes cockroachdb#28918. Release note: None # # Commit message recommendations: # # --- # <pkg>: <short description> # # <long description> # # Release note (category): <release note description> # --- # # Wrap long lines! 72 columns is best. # # The release note must be present if your commit has # user-facing changes. Leave the default above if not. # # Categories for release notes: # - cli change # - sql change # - admin ui change # - general change (e.g., change of required Go version) # - build change (e.g., compatibility with older CPUs) # - enterprise change (e.g., change to backup/restore) # - backwards-incompatible change # - performance improvement # - bug fix # # Commit message recommendations: # # --- # <pkg>: <short description> # # <long description> # # Release note (category): <release note description> # --- # # Wrap long lines! 72 columns is best. # # The release note must be present if your commit has # user-facing changes. Leave the default above if not. # # Categories for release notes: # - cli change # - sql change # - admin ui change # - general change (e.g., change of required Go version) # - build change (e.g., compatibility with older CPUs) # - enterprise change (e.g., change to backup/restore) # - backwards-incompatible change # - performance improvement # - bug fix
Picks up etcd-io/etcd#10167. Future commits will use the new setting to replace broken logic that prevented unbounded Raft log growth. This also picks up etcd-io/etcd#10063. Release note: None
Picks up etcd-io/etcd#10167. Future commits will use the new setting to replace broken logic that prevented unbounded Raft log growth. This also picks up etcd-io/etcd#10063. Release note: None
Picks up etcd-io/etcd#10167. Future commits will use the new setting to replace broken logic that prevented unbounded Raft log growth. This also picks up etcd-io/etcd#10063. Release note: None
In #9982, a mechanism to limit the size of
CommittedEntries
wasintroduced. The way this mechanism works was that it would load
applicable entries (passing the max size hint) and would emit a
HardState
whose commit index was truncated to match the limitationapplied to the entries. Unfortunately, this was subtly incorrect
when the user-provided
Entries
implementation didn't exactlymatch what Raft uses internally. Depending on whether a
Node
ora
RawNode
was used, this would either lead to regressing theHardState's commit index or outright forgetting to apply entries,
respectively.
Asking implementers to precisely match the Raft size limitation
semantics was considered but looks like a bad idea as it puts
correctness squarely in the hands of downstream users. Instead, this
PR removes the truncation of
HardState
when limiting is activeand tracks the applied index separately. This removes the old
paradigm (that the previous code tried to work around) that the
client will always apply all the way to the commit index, which
isn't true when commit entries are paginated.
See 1 for more on the discovery of this bug (CockroachDB's
implementation of
Entries
returns one more entry than Raft's when thesize limit hits).