-
Notifications
You must be signed in to change notification settings - Fork 9.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Inconsistent data in an etcd cluster #9630
Comments
Is that replaced node (new node) the one with different data size?
What do we mean by "replace" node? Was it done via Also, have you ever run |
Btw, the TLS fix has been released with https://github.com/coreos/etcd/releases/tag/v3.2.19 yesterday. |
We replaced the 3 nodes at the same time. I know this is a very bad idea and if the cluster had failed to come up when the 3 new nodes had started I think it would have make sense. But the cluster elected a leader successfully and started serving requests.
No terraform just shut down the three nodes and created new instances with the same data disks and ip addresses. I will run the defrag command. (yes we plan to use the new release as soon as possible) |
I see. So, before replace happens, data sizes were consistent?
Yes, please try, and let us know if it still returns different numbers. |
I just did a defrag:
So data sizes are still different (and some keys are still missing when accessing the 3rd node) Before the replace the cluster was working fine and I did not look into etcd but I assume everything was ok |
@lbernail As long as Raft index (the last column in Can you also check
for each node? The header's revision should be same across the cluster, if they have the same data. |
Thank you Would the database size issue explain data inconsistency between nodes?
|
Strange. All revisions are different. Are they still receiving writes? Also seems like WAL file were not kept on migration? Raft term is only 2, which means there was only one leader election. |
The kubernetes cluster is still up (in a pretty bad state) so it is still reading and writing to etcd. The disks were supposed to be identical on the new instances and the raft term is low because the cluster had only been up for a few hours. After more investigation it turns out we had a disk issue: we use an EBS disk for data and one for wal and when disks were reattached they were swapped (the wal disk was attached as the data disk and the data disk as the wal disk). This is due to how nvme drives are listed on new c5/m5 instances on ubuntu and we have just fixed it. So what actually happened is we had a 3 node cluster, we stopped them and restarted them in this state:
In this situation, we would have expected the cluster to not come up at all (which makes sense) but it came up in the bad state described above (serving inconsistent data depending on the node we were reaching) |
you probably reused the old cluster token and exact same configuration. etcd really cannot distinguish this setup from a fresh new cluster setup. The new two nodes might form a new cluster on themselves, disrupt the old one. Also I think the one "old" node also has some issues since its term went back to 2. Without more information, it is hard for us to debug this situation. If you can reproduce it reliably, we can look into it more. |
We have a 3 node etcd cluster that we used as a backend for a kubernetes cluster and on one of the nodes the data is inconsistent with the others:
Member list
Status
Data inconsistency
OK Node
Inconsistent Node: key is missing
Possible cause
We manage our cluster with terraform and we upgraded it. The upgrade involved replacing the etcd instances but we kept the data and wal directories (on EBS drives on AWS) and the new nodes had the same IP as the initial ones and the same etcd version. However etcd was probably not cleanly shut down.
etcd version: We were using a custom build from the 3.2 branch because 3.2.19 had not been released yet and we needed this PR: #9570
Our etcd was built from this commit: https://github.com/roboll/etcd/commit/d45053c068950a5672a22d1192249313dbcbca26 with go 1.10 (binary available here: https://github.com/roboll/etcd/releases/tag/v3.2.19-datadog). Even if this is not an official release we believe that this should not have happened.
We are keeping the cluster in this state to be able to diagnose what happened. We are happy to send more details.
The text was updated successfully, but these errors were encountered: