Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ZOOKEEPER-4643: Committed txns may be improperly truncated if follower crashes right after updating currentEpoch but before persisting txns to disk #2028

Closed

Conversation

AlphaCanisMajoris
Copy link
Contributor

See ZOOKEEPER-4643 for details on the symptom, example trace, diagnostic, and possible fix idea.

To avoid the issues of ZOOKEEPER-4643, one possible fix is to guarantee that a follower updates its currentEpoch file only after it has synced the leader's history (persisted the pending transactions to disk) when receiving NEWLEADER in the SYNC phase.

The solution in this patch is built upon the FIX of ZOOKEEPER-4646 & ZOOKEEPER-4685, which guarantees that a follower syncs the leader's history (logs the pending transactions to disk) before replying ACK of NEWLEADER.

Overall, when a follower receives the NEWLEADER message, it will persist the pending transactions to disk first, then update the currentEpoch file, and finally reply with an ACK of NEWLEADER. This specific order ensures that issues such as ZOOKEEPER-4643, ZOOKEEPER-4646 & ZOOKEEPER-4685 are avoided.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant