Jenkins flake on pinging /healthz #6031

stevekuznetsov · 2015-11-23T23:37:18Z

Seen here:

ERROR: gave up waiting for https://127.0.0.1:28443/healthz

!!! Error in hack/test-cmd.sh:362
    'return 1' exited with status 1
Call stack:
    1: hack/test-cmd.sh:362 main(...)
Exiting with status 1

The text was updated successfully, but these errors were encountered:

miminar · 2015-11-27T08:43:10Z

Can also be seen in #5141, #5578. #5819, #5932, #5971, #5997, #6020, #6063.

php-coder · 2015-12-01T14:14:24Z

In my case extended tests failed with the same error. I have found the problem, there will be running another instance of openshift:

[INFO] Scan of OpenShift related processes already up via ps -ef    | grep openshift : 
root      8392 16944  0 13:25 pts/0    00:00:00 sudo /data/src/github.com/openshift/origin/_output/local/bin/linux/amd64/openshift start --public-master=localhost --volume-dir=/opt/openshift
root      8396  8392 22 13:25 pts/0    00:00:08 /data/src/github.com/openshift/origin/_output/local/bin/linux/amd64/openshift start --public-master=localhost --volume-dir=/opt/openshift
vagrant   8668  8476  0 13:26 pts/0    00:00:00 grep openshift
[INFO] Starting OpenShift server
[INFO] OpenShift server start at: 
Tue Dec  1 13:26:19 UTC 2015
ERROR: gave up waiting for https://10.0.2.15:8443/healthz

!!! Error in ./test/extended/../../hack/util.sh:364
    'return 1' exited with status 1
Call stack:
    1: ./test/extended/../../hack/util.sh:364 start_os_server(...)
    2: ./test/extended/core.sh:118 main(...)
Exiting with status 1

HTH

stevekuznetsov · 2015-12-01T14:17:01Z

This is a common problem when running locally, yes. I do not think that this is the issue on Jenkins, however.

knobunc · 2015-12-17T20:53:11Z

Hitting this too :-(

https://ci.openshift.redhat.com/jenkins/job/test_pull_requests_origin/7931/consoleText

[INFO] Scan of OpenShift related processes already up via ps -ef | grep openshift :
ec2-user 26025 26024 0 15:40 ? 00:00:00 /bin/bash /data/src/github.com/openshift/origin/hack/update-generated-swagger-spec.sh _output/verify-generated-swagger-spec
ec2-user 26120 26025 0 15:40 ? 00:00:00 grep openshift
[INFO] Starting OpenShift server
[INFO] OpenShift server start at:
Thu Dec 17 15:40:37 EST 2015
ERROR: gave up waiting for https://127.0.0.1:38443/healthz

!!! Error in /data/src/github.com/openshift/origin/hack/../hack/util.sh:364
'return 1' exited with status 1
Call stack:
1: /data/src/github.com/openshift/origin/hack/../hack/util.sh:364 start_os_master(...)
2: /data/src/github.com/openshift/origin/hack/update-generated-swagger-spec.sh:55 main(...)
Exiting with status 1

liggitt · 2015-12-19T21:23:13Z

https://ci.openshift.redhat.com/jenkins/job/merge_pull_requests_origin/4432/

liggitt · 2016-01-14T13:36:42Z

https://ci.openshift.redhat.com/jenkins/job/devenv_ami/3144/console

stevekuznetsov · 2016-03-16T12:21:36Z

@pweil- are you still looking into this? The LDAP extended run hits this pretty often, and there's never seemingly any other OpenShift-related processes up at the time. Seems like a deeper failure of the API server init

pweil- · 2016-03-16T12:35:11Z

no, not at the moment. There was an update in the timeout and more debugging added so we closed my debug PR #6171 (comment).

stevekuznetsov · 2016-03-16T12:41:47Z

Hm. We're not seeing it terribly often on the PR job as things fail in the verify/unit test phase often, but there is still some prevalence of this issue. On the LDAP job, there is very little that could flake before this so we're seeing it more often. Doesn't seem as though the fixes you mention did much to alleviate the problem.

stevekuznetsov · 2016-03-28T16:42:10Z

Seen this three times in a row on the LDAP job.

stevekuznetsov · 2016-04-01T04:42:21Z

@pweil- continuing to see this -- the LDAP job is up to around 20% failure rate due to this issue alone.

smarterclayton · 2016-04-22T18:22:12Z

The LDAP job needs to get david's etcd fix thing.

stevekuznetsov mentioned this issue Nov 24, 2015

refactored test/cmd/policy to use wrapper methods #6049

Merged

liggitt added the kind/test-flake Categorizes issue or PR as related to test flakes. label Nov 25, 2015

stevekuznetsov mentioned this issue Nov 26, 2015

refactored test/cmd/images to use wrapper methods #6046

Merged

liggitt added the priority/P1 label Nov 27, 2015

danmcp assigned stevekuznetsov and pweil- and unassigned stevekuznetsov Dec 1, 2015

pweil- mentioned this issue Dec 2, 2015

debug healthz wait flakes #6171

Closed

liggitt mentioned this issue Dec 3, 2015

Upstream: 16728: lengthened pv controller sync period to 10m #6157

Merged

soltysh mentioned this issue Dec 4, 2015

Bug1281928 - fix image stream tagging for DockerImage type images. #5932

Merged

This was referenced Dec 4, 2015

refactored test/cmd/admin to use wrapper functions #6027

Merged

Allow junit output filename to be overridden #6211

Merged

rhcarvalho mentioned this issue Dec 9, 2015

remove buildstrategy and buildsource type field #6213

Merged

stevekuznetsov mentioned this issue Dec 14, 2015

removed tryuntil from test-cmd #6135

Merged

liggitt mentioned this issue Dec 19, 2015

Shorten image importer dialTimeout to 5 seconds #6418

Merged

stevekuznetsov mentioned this issue Dec 24, 2015

refactored test-end-to-end/core to use os::cmd functions #6481

Closed

stevekuznetsov mentioned this issue Jan 4, 2016

moved tools from cmd/ to tools/ #6525

Merged

0xmichalis mentioned this issue Jan 31, 2016

oc: add create route subcommands #6819

Merged

pweil- added priority/P2 and removed priority/P1 labels Feb 3, 2016

soltysh mentioned this issue Feb 6, 2016

added property deduping for junitreport #6742

Merged

sosiouxme mentioned this issue Feb 16, 2016

ClusterResourceOverride: overcommit admission controller #6901

Merged

soltysh mentioned this issue Mar 4, 2016

Add quota information to oc describe image stream #7712

Merged

danmcp added the component/kubernetes label Sep 7, 2016

smarterclayton closed this as completed Jan 23, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Jenkins flake on pinging /healthz #6031

Jenkins flake on pinging /healthz #6031

stevekuznetsov commented Nov 23, 2015

miminar commented Nov 27, 2015

php-coder commented Dec 1, 2015

stevekuznetsov commented Dec 1, 2015

knobunc commented Dec 17, 2015

liggitt commented Dec 19, 2015

liggitt commented Jan 14, 2016

stevekuznetsov commented Mar 16, 2016

pweil- commented Mar 16, 2016

stevekuznetsov commented Mar 16, 2016

stevekuznetsov commented Mar 28, 2016

stevekuznetsov commented Apr 1, 2016

smarterclayton commented Apr 22, 2016

Jenkins flake on pinging /healthz #6031

Jenkins flake on pinging /healthz #6031

Comments

stevekuznetsov commented Nov 23, 2015

miminar commented Nov 27, 2015

php-coder commented Dec 1, 2015

stevekuznetsov commented Dec 1, 2015

knobunc commented Dec 17, 2015

liggitt commented Dec 19, 2015

liggitt commented Jan 14, 2016

stevekuznetsov commented Mar 16, 2016

pweil- commented Mar 16, 2016

stevekuznetsov commented Mar 16, 2016

stevekuznetsov commented Mar 28, 2016

stevekuznetsov commented Apr 1, 2016

smarterclayton commented Apr 22, 2016