Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix the potential data loss for clusters with only one member #14394

Closed
wants to merge 1 commit into from

Commits on Aug 29, 2022

  1. fix the potential data loss for clusters with only one member

    For a cluster with only one member, the raft always send identical
    unstable entries and committed entries to etcdserver, and etcd
    responds to the client once it finishes (actually partially) the
    applying workflow.
    
    When the client receives the response, it doesn't mean etcd has already
    successfully saved the data, including BoltDB and WAL, because:
       1. etcd commits the boltDB transaction periodically instead of on each request;
       2. etcd saves WAL entries in parallel with applying the committed entries.
    Accordingly, it may run into a situation of data loss when the etcd crashes
    immediately after responding to the client and before the boltDB and WAL
    successfully save the data to disk.
    Note that this issue can only happen for clusters with only one member.
    
    For clusters with multiple members, it isn't an issue, because etcd will
    not commit & apply the data before it being replicated to majority members.
    When the client receives the response, it means the data must have been applied.
    It further means the data must have been committed.
    Note: for clusters with multiple members, the raft will never send identical
    unstable entries and committed entries to etcdserver.
    
    Signed-off-by: Benjamin Wang <wachao@vmware.com>
    ahrtr committed Aug 29, 2022
    Configuration menu
    Copy the full SHA
    3243706 View commit details
    Browse the repository at this point in the history