"failed to propose on members [https://127.0.0.1:24001]" #6447

deads2k · 2015-12-21T18:13:07Z

This bug happens because the etcd server can successfully write, but the sync to the wal can be super slow. That results in the etcd server replying with a 500. The etcd client then retries the call automatically, which fails because the action was already taken.

This can manifest as:

unexpected error: namespaces "hammer-project" already exists
etcdhttp: got unexpected response error (etcdserver: request timed out)
Unable to initialize namespaces: unable to persist the updated namespace UID allocations: uidallocation "" cannot be updated: another caller has already initialized the resource

https://ci.openshift.redhat.com/jenkins/job/test_pull_requests_origin/8038/consoleText

FAILURE after 12.429s: hack/../test/cmd/builds.sh:92: executing 'oc start-build --from-webhook=https://127.0.0.1:28443/oapi/v1/namespaces/cmd-builds/buildconfigs/ruby-sample-build/webhooks/secret101/generic' expecting success: the command returned the wrong error code
There was no output from the command.
Standard error from the command:
error: server rejected our request 500
remote: {
  "kind": "Status",
  "apiVersion": "v1",
  "metadata": {},
  "status": "Failure",
  "message": "Internal error occurred: could not generate a build: 501: All the given peers are not reachable (failed to propose on members [https://127.0.0.1:24001] twice [last error: Unexpected HTTP status code]) [0]",
  "reason": "InternalError",
  "details": {
    "causes": [
      {
        "message": "could not generate a build: 501: All the given peers are not reachable (failed to propose on members [https://127.0.0.1:24001] twice [last error: Unexpected HTTP status code]) [0]"
      }
    ]
  },
  "code": 500
}
!!! Error in hack/../test/cmd/../../hack/cmd_util.sh:193
    'return 1' exited with status 1
Call stack:
    1: hack/../test/cmd/../../hack/cmd_util.sh:193 os::cmd::expect_success(...)
    2: hack/../test/cmd/builds.sh:92 main(...)
Exiting with status 1
!!! Error in hack/test-cmd.sh:286
    '${test}' exited with status 1
Call stack:
    1: hack/test-cmd.sh:286 main(...)
Exiting with status 1
[FAIL] !!!!! Test Failed !!!!

See #6065 for more details.

The text was updated successfully, but these errors were encountered:

deads2k · 2015-12-21T19:51:19Z

Another occurrence here:

https://ci.openshift.redhat.com/jenkins/job/test_pull_requests_origin/8043/consoleText

In project test on server https://172.30.0.1:443

2 warnings identified, use 'oc status -v' to see details.
Error from server: 501: All the given peers are not reachable (failed to propose on members [https://172.18.6.21:4001] twice [last error: Unexpected HTTP status code]) [0]
!!! Error in hack/../test/end-to-end/core.sh:160
    'oc delete pod cli-with-token' exited with status 1
Call stack:
    1: hack/../test/end-to-end/core.sh:160 main(...)
Exiting with status 1

[FAIL] !!!!! Test Failed !!!!

stevekuznetsov · 2015-12-22T00:36:03Z

flake here:

FAILURE after 63.617s: hack/../test/cmd/builds.sh:108: executing 'oc process -f examples/sample-app/application-template-dockerbuild.json -l build=docker | oc create -f -' expecting success: the command returned the wrong error code
Standard output from the command:
imagestream "origin-ruby-sample" created
deploymentconfig "frontend" created
service "database" created
deploymentconfig "database" created
Standard error from the command:
Error from server: Timeout: request did not complete within allowed duration
Error from server: 501: All the given peers are not reachable (failed to propose on members [https://127.0.0.1:24001] twice [last error: Unexpected HTTP status code]) [0]
Error from server: imageStream "ruby-22-centos7" already exists
Error from server: buildconfig "ruby-sample-build" already exists
[FAIL] !!!!! Test Failed !!!!

0xmichalis · 2015-12-22T09:06:31Z

Another

Error from server: 501: All the given peers are not reachable (failed to propose on members [https://172.18.4.116:4001] twice [last error: Unexpected HTTP status code]) [0]
!!! Error in hack/../test/end-to-end/core.sh:164
    'oc delete pod cli-with-token-2' exited with status 1
Call stack:
    1: hack/../test/end-to-end/core.sh:164 main(...)
Exiting with status 1

https://ci.openshift.redhat.com/jenkins/job/merge_pull_requests_origin/4448/consoleFull

rhcarvalho · 2015-12-22T16:40:18Z

One more: https://ci.openshift.redhat.com/jenkins/job/test_pull_requests_origin/8075/console

eparis · 2016-01-08T15:43:15Z

Weeeeeeeee
#6543
https://ci.openshift.redhat.com/jenkins/job/merge_pull_requests_origin/4551/

aveshagarwal · 2016-01-08T20:07:31Z

When I run ./hack/test-end-to-end.sh with latest origin, I am seeing similar issue:

[INFO] Running a CLI command in a container using the service account
Waiting for pod test/cli-with-token to be running, status is Pending, pod ready: false
Waiting for pod test/cli-with-token to be running, status is Pending, pod ready: false
Error attaching, falling back to logs: error executing remote command: Error executing command in container: container not found ("cli-with-token")
F0108 20:00:54.584136 1 helpers.go:96] Error in configuration: default cluster has no server defined
!!! Error in ./hack/../test/end-to-end/core.sh:160
'[ "$(cat ${LOG_DIR}/cli-with-token.log | grep 'Using in-cluster configuration')" ]' exited with status 1
Call stack:
1: ./hack/../test/end-to-end/core.sh:160 main(...)
Exiting with status 1
!!! Error in ./hack/test-end-to-end.sh:51
'${OS_ROOT}/test/end-to-end/core.sh' exited with status 1
Call stack:
1: ./hack/test-end-to-end.sh:51 main(...)
Exiting with status 1

[FAIL] !!!!! Test Failed !!!!

deads2k · 2016-01-11T12:51:11Z

It looks like this is being caused by sudden latency in disk IO. See #6542 (comment) for details.

Since this seems to be an environmental problem, we're currently working around this problem by using a ramdisk. This frees the merge and test queue for now.

smarterclayton · 2016-01-30T19:20:29Z

Something's back

I0130 13:13:02.433088    5645 decoder.go:141] decoding stream as JSON
2016-01-30 13:13:10.435005 E | etcdhttp: got unexpected response error (etcdserver: request timed out)
E0130 13:13:10.435279    5645 etcd.go:95] etcd failure response: HTTP/1.1 500 Internal Server Error
Content-Length: 100
Content-Type: application/json
Date: Sat, 30 Jan 2016 18:13:10 GMT
X-Etcd-Cluster-Id: 3a66c6e8db3c8d30
X-Etcd-Index: 0

{"errorCode":300,"message":"Raft Internal Error","cause":"etcdserver: request timed out","index":0}

deads2k · 2016-02-01T13:03:48Z

@smarterclayton happened after the CI flow changed. I wonder if its exceeding the pre-allocated space now. Since I'm rebasing and @liggitt already tried, you want a go?

smarterclayton · 2016-02-01T15:44:45Z

Jordan's looking at it but it's very likely the builds are moved

On Mon, Feb 1, 2016 at 8:03 AM, David Eads notifications@github.com wrote:

@smarterclayton https://github.com/smarterclayton happened after the CI
flow changed. I wonder if its exceeding the pre-allocated space now. Since
I'm rebasing and @liggitt https://github.com/liggitt already tried, you
want a go?

—
Reply to this email directly or view it on GitHub
#6447 (comment).

smarterclayton · 2016-04-22T18:20:32Z

I think this is resolved now.

Master startup can fail when ec2 transparently reallocates the block storage, causing etcd writes to temporarily fail. Retry failures blindly just once to allow time for this transient condition to to resolve and for systemd to restart the master (which will eventually succeed). etcd-io/etcd#3864 openshift/origin#6065 openshift/origin#6447

deads2k added priority/P2 kind/test-flake Categorizes issue or PR as related to test flakes. labels Dec 21, 2015

deads2k mentioned this issue Dec 21, 2015

Allow parallel image stream importing #6407

Merged

This was referenced Dec 22, 2015

graphapi: Remove dead code and add godoc #6432

Merged

deployapi: Refactor internal objects to match versioned #6246

Merged

rhcarvalho mentioned this issue Dec 22, 2015

Improve namer.GetName #6462

Merged

0xmichalis mentioned this issue Dec 22, 2015

oc: Add more doc and examples in oc get #6459

Merged

This was referenced Dec 23, 2015

declared variables better for RHEL #6469

Merged

fixed TestEditor output #6476

Merged

liggitt added priority/P1 and removed priority/P2 labels Dec 23, 2015

This was referenced Dec 23, 2015

e2e container exec failure flake #6444

Closed

Update to etcd v2.2.2 #6429

Merged

Test flake: 'User "e2e-user" cannot list all buildconfigs in the cluster' and other errors #6149

Closed

stevekuznetsov mentioned this issue Jan 4, 2016

created structure for whitelisting directories for govet shadow testing #6509

Merged

deads2k mentioned this issue Jan 4, 2016

stop etcd from retrying failures #6529

Merged

stevekuznetsov mentioned this issue Jan 4, 2016

refactored test-end-to-end/core to use os::cmd functions #6481

Closed

danmcp assigned deads2k Jan 5, 2016

stevekuznetsov mentioned this issue Jan 5, 2016

fixed readiness endpoint route listing #4074

Merged

liggitt mentioned this issue Jan 5, 2016

template not found flake in test/cmd/templates.sh #6453

Closed

eparis mentioned this issue Jan 6, 2016

OpenShift SDN update #6532

Merged

This was referenced Jan 12, 2016

flake in test/cmd/images: imagestream already exists #6461

Closed

TestAuthorizationRestrictedAccessForProjectAdmins / TestAuthorizationSubjectAccessReview flake #6065

Closed

pweil- mentioned this issue Jan 28, 2016

debug healthz wait flakes #6171

Closed

deads2k assigned smarterclayton and unassigned deads2k Feb 1, 2016

danmcp added priority/P2 and removed priority/P1 labels Feb 3, 2016

liggitt mentioned this issue Feb 12, 2016

test flake: TestMultipleImageChangeBuildTriggers #4473

Closed

smarterclayton closed this as completed Apr 22, 2016

ironcladlou mentioned this issue Oct 7, 2016

Retry failed master startup once openshift/openshift-ansible#2572

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

"failed to propose on members [https://127.0.0.1:24001]" #6447

"failed to propose on members [https://127.0.0.1:24001]" #6447

deads2k commented Dec 21, 2015

deads2k commented Dec 21, 2015

stevekuznetsov commented Dec 22, 2015

0xmichalis commented Dec 22, 2015

rhcarvalho commented Dec 22, 2015

eparis commented Jan 8, 2016

aveshagarwal commented Jan 8, 2016

deads2k commented Jan 11, 2016

smarterclayton commented Jan 30, 2016

deads2k commented Feb 1, 2016

smarterclayton commented Feb 1, 2016

smarterclayton commented Apr 22, 2016

"failed to propose on members [https://127.0.0.1:24001]" #6447

"failed to propose on members [https://127.0.0.1:24001]" #6447

Comments

deads2k commented Dec 21, 2015

deads2k commented Dec 21, 2015

stevekuznetsov commented Dec 22, 2015

0xmichalis commented Dec 22, 2015

rhcarvalho commented Dec 22, 2015

eparis commented Jan 8, 2016

aveshagarwal commented Jan 8, 2016

deads2k commented Jan 11, 2016

smarterclayton commented Jan 30, 2016

deads2k commented Feb 1, 2016

smarterclayton commented Feb 1, 2016

smarterclayton commented Apr 22, 2016