-
Notifications
You must be signed in to change notification settings - Fork 2.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Retry failed master startup once #2572
Conversation
Master startup can fail when ec2 transparently reallocates the block storage, causing etcd writes to temporarily fail. Retry failures blindly just once to allow time for this transient condition to to resolve and for systemd to restart the master (which will eventually succeed). etcd-io/etcd#3864 openshift/origin#6065 openshift/origin#6447
Still testing this, but wanted to get it ready for discussion. |
aos-ci-test |
Will we also want to do this with the controllers service for HA installs? |
1bc6d43 - State: success - All Test Contexts: aos-ci-jenkins/OS_unit_tests - Logs: https://aos-ci.s3.amazonaws.com/openshift/openshift-ansible/jenkins-openshift-ansible-2-unit-tests-355/1bc6d4390661fe18bebbc020b2c7b25972e80b41.txt |
1bc6d43 - State: success - All Test Contexts: "aos-ci-jenkins/OS_3.4_NOT_containerized, aos-ci-jenkins/OS_3.4_NOT_containerized_e2e_tests" - Logs: https://aos-ci.s3.amazonaws.com/openshift/openshift-ansible/jenkins-openshift-ansible-3-test-matrix-CONTAINERIZED=_NOT_containerized,OSE_VER=3.4,PYTHON=System-CPython-2.7,TOPOLOGY=openshift-cluster,TargetBranch=master,nodes=openshift-ansible-slave-374/1bc6d4390661fe18bebbc020b2c7b25972e80b41.txt |
1bc6d43 - State: success - All Test Contexts: "aos-ci-jenkins/OS_3.3_NOT_containerized, aos-ci-jenkins/OS_3.3_NOT_containerized_e2e_tests" - Logs: https://aos-ci.s3.amazonaws.com/openshift/openshift-ansible/jenkins-openshift-ansible-3-test-matrix-CONTAINERIZED=_NOT_containerized,OSE_VER=3.3,PYTHON=System-CPython-2.7,TOPOLOGY=openshift-cluster,TargetBranch=master,nodes=openshift-ansible-slave-374/1bc6d4390661fe18bebbc020b2c7b25972e80b41.txt |
1bc6d43 - State: success - All Test Contexts: "aos-ci-jenkins/OS_3.3_containerized, aos-ci-jenkins/OS_3.3_containerized_e2e_tests" - Logs: https://aos-ci.s3.amazonaws.com/openshift/openshift-ansible/jenkins-openshift-ansible-3-test-matrix-CONTAINERIZED=_containerized,OSE_VER=3.3,PYTHON=System-CPython-2.7,TOPOLOGY=openshift-cluster-containerized,TargetBranch=master,nodes=openshift-ansible-slave-374/1bc6d4390661fe18bebbc020b2c7b25972e80b41.txt |
1bc6d43 - State: success - All Test Contexts: "aos-ci-jenkins/OS_3.4_containerized, aos-ci-jenkins/OS_3.4_containerized_e2e_tests" - Logs: https://aos-ci.s3.amazonaws.com/openshift/openshift-ansible/jenkins-openshift-ansible-3-test-matrix-CONTAINERIZED=_containerized,OSE_VER=3.4,PYTHON=System-CPython-2.7,TOPOLOGY=openshift-cluster-containerized,TargetBranch=master,nodes=openshift-ansible-slave-374/1bc6d4390661fe18bebbc020b2c7b25972e80b41.txt |
Yeah, I think we should probably do this for all the cases where we're On Oct 7, 2016 5:05 PM, "Andrew Butcher" notifications@github.com wrote:
|
I'll get those in my PR. |
Master startup can fail when ec2 transparently reallocates the block
storage, causing etcd writes to temporarily fail. Retry failures blindly
just once to allow time for this transient condition to to resolve and for
systemd to restart the master (which will eventually succeed).
etcd-io/etcd#3864
openshift/origin#6065
openshift/origin#6447