Skip to content
This repository has been archived by the owner on Mar 28, 2020. It is now read-only.

fail to remove etcd member (xxx): etcdserver: re-configuration failed due to not enough started members #975

Closed
hongchaodeng opened this issue Apr 21, 2017 · 4 comments · Fixed by #979
Labels

Comments

@hongchaodeng
Copy link
Member

hongchaodeng commented Apr 21, 2017

Seeing this error from jenkins job with 1.6 cluster:
https://jenkins-etcd.prod.coreos.systems/job/etcd-operator-master-k8s-1-6-regression/39/consoleText

It's in recovery test, remove 1 member. operator logs showed that after one pod was deleted, operator kept failing removing the member.

@hongchaodeng
Copy link
Member Author

An update on my findings:

  • It seems etcd pod "0002" showed "running" but etcd server wasn't up. There is no log for it.
  • after "0000" was deleted, "0001" kept doing leader-election.

@hongchaodeng
Copy link
Member Author

This is happening very frequently (actually the only failing test) in 1.6 testing.

@hongchaodeng
Copy link
Member Author

Now I get more logs on pod "0002". It kept failing:

pkg/netutil: failed resolving host test-etcd-9zd22-0000.test-etcd-9zd22.e2e-etcd-operator-master-k8s-1-6-regression-74.svc.cluster.local:2380 (lookup test-etcd-9zd22-0000.test-etcd-9zd22.e2e-etcd-operator-master-k8s-1-6-regression-74.svc.cluster.local on 10.43.240.10:53: no such host)

"0000" should have died. But "02" doesn't seem to form a quorum with "01".

@hongchaodeng
Copy link
Member Author

hongchaodeng commented Apr 21, 2017

Filed an etcd issue: etcd-io/etcd#7798

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant