-
Notifications
You must be signed in to change notification settings - Fork 312
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Terminate tikv server instead of killing it when evict leader timeout or is skipped #1432
Comments
@BusyJay Could you confirm killing TiKV with |
Yes, we have tests to random kill TiKV. Or we can enlarge evict leader timeout to infinite, which can be considered safer. |
Due to tikv/tikv#9624, it may not be safe to |
Infinite timeout is not good in practice, that could block the upgrade process and the user have to interrupt the command by hand, leaving the cluster in a hybrid status with both old and new versions of components running, and can not overcome from that status (because retry of upgrade command will still be blocked at leader evicting). We can indeed increase the default timeout for evicting leader, but that don't really solve the problem. |
Better than nothing. |
OK |
Feature Request
Is your feature request related to a problem? Please describe:
Due to tikv/tikv#10353, if there are still leaders on a TiKV and it is shutdown gracefully, there is a chance to corrupt transactions.
Describe the feature you'd like:
TiKV is working on a patch to fix the problem, but for older versions, it's required to use SIGKILL to terminate TiKV immediately if evict leader timeout or is skipped to avoid potential risk. So there are two TODOs:
systemctl kill -s KILL service
to stop TiKV.Why the featue is needed:
See above.
Describe alternatives you've considered:
Ignore the problem, then there is a risk that upgrading cluster, especially large cluster, can corrupt data.
Teachability, Documentation, Adoption, Migration Strategy:
The text was updated successfully, but these errors were encountered: