etcd errors in automated tests #6066

csrwng · 2015-11-24T21:09:23Z

At least in a couple of merge runs, the test dies because somehow etcd can't be started

https://ci.openshift.redhat.com/jenkins/job/merge_pull_requests_origin/4133/consoleFull
https://ci.openshift.redhat.com/jenkins/job/merge_pull_requests_origin/4135/consoleFull

The latter run has the following in the openshift.log:

2015-11-24 15:06:59.500716 I | etcdserver: setting up the initial cluster version to 2.1.0
2015-11-24 15:07:00.500903 E | etcdserver: error updating cluster version (etcdserver: request timed out)
2015-11-24 15:07:03.100718 E | etcdserver: publish error: etcdserver: request timed out, possibly due to previous leader failure
2015-11-24 15:07:04.319832 E | etcdhttp: etcdserver: request timed out, possibly due to previous leader failure
2015-11-24 15:07:04.500879 I | etcdserver: setting up the initial cluster version to 2.1.0
2015-11-24 15:07:05.501054 E | etcdserver: error updating cluster version (etcdserver: request timed out)
2015-11-24 15:07:06.536885 N | etcdserver: set the initial cluster version to 2.1.0
2015-11-24 15:07:06.536949 N | etcdserver: updated the cluster version from 2.1.0 to 2.1.0
2015-11-24 15:07:06.537012 I | etcdserver: published {Name:openshift.local ClientURLs:[https://127.0.0.1:24001]} to cluster efffb52923ea33f1
F1124 15:07:06.537668   27786 controller.go:83] Unable to perform initial IP allocation check: unable to persist the updated service IP allocations: serviceipallocation "" cannot be updated: another caller has already initialized the resource

The text was updated successfully, but these errors were encountered:

csrwng · 2015-11-24T21:10:07Z

@ncdc @derekwaynecarr @bparees

ncdc · 2015-11-24T21:12:43Z

I've seen a handful of tests, going back months, where either etcd doesn't start or one of the /healthz checks times out. No idea why as it happens infrequently (or at least did).

eparis · 2016-01-11T16:27:34Z

#6581
https://travis-ci.org/openshift/origin/jobs/100947397

liggitt · 2016-01-12T14:08:48Z

@eparis not sure what that link was supposed to be to... looks like a successful travis run

deads2k · 2016-01-12T19:24:52Z

@csrwng I think this is caused because the allocation succeeded, but etcd reported a failure, the client tried again and instantly failed.

Duping on #6447, reopen if you disagree or hit it again.

danmcp added kind/test-flake Categorizes issue or PR as related to test flakes. priority/P2 labels Dec 1, 2015

danmcp assigned pweil- Dec 1, 2015

deads2k closed this as completed Jan 12, 2016

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

etcd errors in automated tests #6066

etcd errors in automated tests #6066

csrwng commented Nov 24, 2015

csrwng commented Nov 24, 2015

ncdc commented Nov 24, 2015

eparis commented Jan 11, 2016

liggitt commented Jan 12, 2016

deads2k commented Jan 12, 2016

etcd errors in automated tests #6066

etcd errors in automated tests #6066

Comments

csrwng commented Nov 24, 2015

csrwng commented Nov 24, 2015

ncdc commented Nov 24, 2015

eparis commented Jan 11, 2016

liggitt commented Jan 12, 2016

deads2k commented Jan 12, 2016