Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

etcd errors in automated tests #6066

Closed
csrwng opened this issue Nov 24, 2015 · 5 comments
Closed

etcd errors in automated tests #6066

csrwng opened this issue Nov 24, 2015 · 5 comments
Assignees
Labels
kind/test-flake Categorizes issue or PR as related to test flakes. priority/P2

Comments

@csrwng
Copy link
Contributor

csrwng commented Nov 24, 2015

At least in a couple of merge runs, the test dies because somehow etcd can't be started

https://ci.openshift.redhat.com/jenkins/job/merge_pull_requests_origin/4133/consoleFull
https://ci.openshift.redhat.com/jenkins/job/merge_pull_requests_origin/4135/consoleFull

The latter run has the following in the openshift.log:

2015-11-24 15:06:59.500716 I | etcdserver: setting up the initial cluster version to 2.1.0
2015-11-24 15:07:00.500903 E | etcdserver: error updating cluster version (etcdserver: request timed out)
2015-11-24 15:07:03.100718 E | etcdserver: publish error: etcdserver: request timed out, possibly due to previous leader failure
2015-11-24 15:07:04.319832 E | etcdhttp: etcdserver: request timed out, possibly due to previous leader failure
2015-11-24 15:07:04.500879 I | etcdserver: setting up the initial cluster version to 2.1.0
2015-11-24 15:07:05.501054 E | etcdserver: error updating cluster version (etcdserver: request timed out)
2015-11-24 15:07:06.536885 N | etcdserver: set the initial cluster version to 2.1.0
2015-11-24 15:07:06.536949 N | etcdserver: updated the cluster version from 2.1.0 to 2.1.0
2015-11-24 15:07:06.537012 I | etcdserver: published {Name:openshift.local ClientURLs:[https://127.0.0.1:24001]} to cluster efffb52923ea33f1
F1124 15:07:06.537668   27786 controller.go:83] Unable to perform initial IP allocation check: unable to persist the updated service IP allocations: serviceipallocation "" cannot be updated: another caller has already initialized the resource
@csrwng
Copy link
Contributor Author

csrwng commented Nov 24, 2015

@ncdc
Copy link
Contributor

ncdc commented Nov 24, 2015

I've seen a handful of tests, going back months, where either etcd doesn't start or one of the /healthz checks times out. No idea why as it happens infrequently (or at least did).

@danmcp danmcp added kind/test-flake Categorizes issue or PR as related to test flakes. priority/P2 labels Dec 1, 2015
@eparis
Copy link
Member

eparis commented Jan 11, 2016

@liggitt
Copy link
Contributor

liggitt commented Jan 12, 2016

@eparis not sure what that link was supposed to be to... looks like a successful travis run

@deads2k
Copy link
Contributor

deads2k commented Jan 12, 2016

@csrwng I think this is caused because the allocation succeeded, but etcd reported a failure, the client tried again and instantly failed.

Duping on #6447, reopen if you disagree or hit it again.

@deads2k deads2k closed this as completed Jan 12, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/test-flake Categorizes issue or PR as related to test flakes. priority/P2
Projects
None yet
Development

No branches or pull requests

7 participants