-
Notifications
You must be signed in to change notification settings - Fork 9.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
raft learner limit increase proposal #13148
Comments
I'm ignorant - can you help me learn why is this necessary? Other than the additional replication log load, I feel like the whole point of learners is that they don't affect anything important. So it should be safe to have lots of them (on an unloaded cluster, and for small values of 'lots'), added/removed fairly arbitrarily afaics? .. Do we care how the existing learners are doing before adding/removing other learners? |
The concern is stress on leader to provide log replication to learners. If you add them unbounded leader will take a performance hit. Think of 6GB state file replicated N times. So one approach I see is ensuring current learners are promotable before scaling another. If you wanted to add learner unbounded for experimental purposes it should probably be under an unsafe flag. |
This all needs testing if your interested in doing a performance review that would help with validation. |
This issue has been automatically marked as stale because it has not had recent activity. It will be closed after 21 days if no further activity occurs. Thank you for your contributions. |
I have been working on the performance review will update when completed. |
In the initial learner implementation, @jingyih had noted a need for an addition of an API for this to be configurable. While this would be ideal we have many stateless runtime configurations which can affect the way the cluster works. Can we rely on the admin to ensure cluster-wide configuration at runtime? My feeling is that a dynamic change of this configuration is probably not needed at this time. Thoughts? cc @jingyih @ptabor @hasbro17 @serathius @chaochn47 [1] #10730 (comment) |
@hexfusion by dynamic API, do you mean something like Just wondering what happens when the etcd instances in a cluster have differing learner limit configurations. Not sure if there is already something that already handles a difference in some cluster-wide configuration flag for a new member. |
If learners exist in the membership and the count is greater than the limit defined at runtime the member will panic.
|
This issue has been automatically marked as stale because it has not had recent activity. It will be closed after 21 days if no further activity occurs. Thank you for your contributions. |
This issue has been automatically marked as stale because it has not had recent activity. It will be closed after 21 days if no further activity occurs. Thank you for your contributions. |
Today we limit the raft learner count to 1[1] which rightfully was set in 3.4[2] as a gate against leader stress of log replication. I think the time has come to consider lifting this limit. Perhaps as a gate to additional learners, we can scale learners only if existing learners are promotable (in sync with the leader). Open to other ideas but I think for the feature to evolve user needs to be able to explore learners further even if this is under an experimental flag.
cc @jingyih @gyuho @xiang90 @ptabor
[1]
etcd/server/etcdserver/api/membership/cluster.go
Lines 339 to 340 in 46b49a6
[2] #10730 (comment)
The text was updated successfully, but these errors were encountered: