-
Notifications
You must be signed in to change notification settings - Fork 4.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Allow raft TrailingLogs to be configured. #6186
Conversation
This fixes pathalogical cases where the write thoughput and snapshot size are both so large that more than 10k log entries are written in the time it takes to restore the snapshot from disk. In this case followers that restart can never catch up with leader replication again and enter a loop of constantly downloading a full snapshot and restoring it only to find that snapshot is already out of date and the leader has truncated its logs so a new snapshot is sent etc. In general if you need to adjust this, you are probably abusing Consul for purposes outside its design envelope and should reconsider your usage to reduce data size and/or write volume.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This all looks good. Should we also set the previous 10k limit here:
consul/agent/config/default.go
Line 233 in 7753b97
leader_lease_timeout = "` + raft.LeaderLeaseTimeout.String() + `" |
This should probably be documented with a warning. Here's a potential one based on your original comment:
It seems unlikely that knowing this setting exists would encourage people to get into the situation where they need it. If they're already in that situation, posting the warning with alternative courses of action may help get them out of it. By the way, I think @mkeeler mentioned that we should also improve the logging around installing snapshots as well. That way users can diagnose the situation if it does come up. |
@mkeeler i looked but right now we don't default any of the other raft configs - we just conditionally set them only if they are non-zero in the consul config so accept Raft's defaults implicitly. I.e. I did the same as we do for |
I added docs. I also realised the other raft tunables are documented as CLI flags out of habit but are actually not valid CLI flags so I've moved them, preserving the old anchor names too. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, just one small comment inline
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Makes sense as part of 1.5.3 👍
This fixes pathalogical cases where the write throughput and snapshot size are both so large that more than 10k log entries are written in the time it takes to restore the snapshot from disk. In this case followers that restart can never catch up with leader replication again and enter a loop of constantly downloading a full snapshot and restoring it only to find that snapshot is already out of date and the leader has truncated its logs so a new snapshot is sent etc.
In general if you need to adjust this, you are probably abusing Consul for purposes outside its design envelope and should reconsider your usage to reduce data size and/or write volume.
Notes
This is the minimal change to allow that situation to be recovered from. There are nicer solutions we can also add later with Raft library changes, e.g. dynamically deciding how much to truncate based on current recovery progress but those changes are much more involved to test and reason about as well as backport to existing versions. This provides a low-risk and simple change we can potentially backport to older versions in case users are stuck on those and hitting this situation in production with no other way to resolve it.
Questions
Should we document this config? If we do I think we should warn against it's use like the other raft tunables but this seems especially like a crutch to work around deployments that are using Consul in cases it's not well suited for. On the other hand, if you are in that situation not being able to self-discover that a solution exists also seems kinda silly.