-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
After range splitting, the newly created range take near 2 seconds to accept command. #1384
Comments
Yeah, we'll have to do something there. As you noticed correctly, the group takes a while to time out. We already have something in place to have a Raft group which starts up with a single replica elect a leader (itself) automatically, but for the range split that doesn't cut it. |
The |
It should be pretty easy to force a campaign after a split, either on the node that ran |
That sounds like a good plan. |
@tschottdorf , the reason why
|
@bdarnell, currently multiraft group is created lazily at |
There's no need to be lazy when we're splitting the range; we could just create the group immediately after the split (or let |
@bdarnell, this
func (r *raft) isElectionTimeout() bool {
// if newly created range or in Election, use short timeout.
if r.lead == None {
return r.elapsed > r.rand.Int()%r.electionTimeout
}
d := r.elapsed - r.electionTimeout
if d < 0 {
return false
}
return d > r.rand.Int()%r.electionTimeout
} |
I don't think we have to worry about the case in which a leader crashes just after the split. If that happens, it's about as bad as any crash. in the regular case, the vote message, when received by a node which hasn't completed the split yet, has two possible outcomes:
@bdarnell is currently revisiting the Raft related races. I imagine he'll have more to say about this shortly. |
Yeah, we don't need to worry about case 1A. A leader crashing always means we need to wait for election timeouts. For 1B, currently a replica that hasn't processed a split can still vote, so campaigning immediately should work. Although once the election has finished and the first We've talked in this thread about ignoring vote requests from unknown ranges, which would invalidate the previous paragraph, but I don't think we'd need to do that if we had storage keys. If we don't do storage keys and instead ignore vote requests from unknown ranges, then we will have to figure out how that interacts with early campaigns. For case 2, we definitely don't want to cause every replica to campaign; that just increases the chances of split votes which require multiple election timeouts. If we can't pick one node to campaign then we may not want to change the election process at all. Your modification to |
|
We need to be able to force an election (on one node) after creating a new group (cockroachdb/cockroach#1384), but it is difficult to ensure that our call to Campaign does not race with an election that may be started by raft itself. A redundant call to Campaign should be a no-op instead of a panic. (But the panic in becomeCandidate remains, because we don't want to update the term or change the committed index in this case)
We need to be able to force an election (on one node) after creating a new group (cockroachdb/cockroach#1384), but it is difficult to ensure that our call to Campaign does not race with an election that may be started by raft itself. A redundant call to Campaign should be a no-op instead of a panic. (But the panic in becomeCandidate remains, because we don't want to update the term or change the committed index in this case)
This minimizes the window of unavailability following a split. Fixes cockroachdb#1384.
This minimizes the window of unavailability following a split. Fixes cockroachdb#1384.
This minimizes the window of unavailability following a split. Fixes cockroachdb#1384.
This minimizes the window of unavailability following a split. Fixes cockroachdb#1384.
When running following command
a error will print:
but the range split had succeed.
In log, the election time delay from the raft creating time is about 2 seconds (raftTick is 100ms, electionTimeoutTicks is 15 which defined in store.go), so only after 2 seconds, the newly created range will accept command like InternalResolveIntent, ConditonalPut, etc. but AdminSplit retry will failed with forementioned error.
Is it a way to shorten this time?
The text was updated successfully, but these errors were encountered: