-
Notifications
You must be signed in to change notification settings - Fork 2.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Adding additonal checks upgrade path #5014
Adding additonal checks upgrade path #5014
Conversation
@juanvallejo it seems to me the etcd checks can be used as-is. They're both intended for already-installed clusters. Any thoughts what would be different about an upgrade? |
bot, retest this |
I was not sure if I was missing any special cases during an upgrade that would need to be handled in the etcd checks. I cannot think of any, so I was hoping I could get enough people to review this and either confirm that nothing else is needed, or catch whatever we're missing |
aaf6dfa
to
8f3a6f6
Compare
[test] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The patch is pretty simple, but I still have doubts about the implications of certain checks in the upgrade flow.
@@ -11,3 +11,6 @@ | |||
checks: | |||
- disk_availability | |||
- memory_availability | |||
- docker_image_availability | |||
- etcd_imagedata_size |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we block an upgrade if "etcd has too much image data"? That may not be a good idea.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
checks can be disabled...
but I guess the question is really "are we running health checks generally or just looking for issues we think might impact the upgrade?"
to the extent that an upgrade might exercise data in etcd, the etcd checks probably make sense either way. you do not want your upgrade to fail partway through due to running out of etcd space, although it was probably going to fill up imminently anyway, at least you don't want to have to deal with that in the middle of an upgrade.
@@ -11,3 +11,6 @@ | |||
checks: | |||
- disk_availability | |||
- memory_availability | |||
- docker_image_availability | |||
- etcd_imagedata_size | |||
- etcd_volume |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Similarly, this might be not-so-good. If an upgrade is actually necessary to reduce etcd usage, this check prior to upgrade could block getting the cluster to a healthier state.
@rhcarvalho Went ahead and removed |
re[test] |
8033106
to
2ece672
Compare
flaked on openshift/origin#8571 |
flake openshift/origin#10162 |
2ece672
to
094b46c
Compare
aos-ci-test |
try a [merge] |
853 [merge] again |
openshift/origin#15769 again which should be fixed now. I'm not sure what this is; there doesn't seem to be a flake issue for it:
Looks like it might be a logging flake, i.e. the logging aggregation didn't start up / catch up fast enough? I can say [merge] again or @sdodson can save some queue time and merge this since other tests seem fine. |
[test] |
c010b7d
to
f2bf837
Compare
aos-ci-test |
[merge] |
f1fda9f
to
6709b8a
Compare
[test] there have been some fixes to the logging job |
aos-ci-test |
[merge] |
The test failure seems real, where do we get python-etcd from? We should add it to openshift-ansible.spec as a requirement if it's required on local host. |
So, it is available by default in the "updates" repo on Fedora, but I don't think there is a repo for it on rhel yet. If it is easier, I could remove the cc @sosiouxme |
6709b8a
to
9dc723c
Compare
Hmm, looks like aos-ci-test results are missing. Plus possibly some flakes in https://ci.openshift.redhat.com/jenkins/job/merge_pull_request_openshift_ansible/944/. |
aos-ci-test |
cc @brenton |
re-[merge] |
Previous merge flake was openshift/origin#16005, and yum. re-[merge] |
Evaluated for openshift ansible merge up to 9dc723c |
continuous-integration/openshift-jenkins/merge FAILURE (https://ci.openshift.redhat.com/jenkins/job/merge_pull_request_openshift_ansible/979/) (Base Commit: 4acf08d) (PR Branch Commit: 9dc723c) |
[test] |
Evaluated for openshift ansible test up to 9dc723c |
continuous-integration/openshift-jenkins/test SUCCESS (https://ci.openshift.redhat.com/jenkins/job/test_pull_request_openshift_ansible/610/) (Base Commit: 5aaa24b) (PR Branch Commit: 9dc723c) |
@sosiouxme or @rhcarvalho tests seem to be passing, mind tagging once more? |
@juanvallejo I think we're only merging bug fixes this week |
/lgtm |
I think for the new CI we also need to: |
/retest |
/test tox |
CI fail |
/test all [submit-queue is verifying that this PR is safe to merge] |
Automatic merge from submit-queue |
Depends on #4960
TODO
upgrade
playbook context onetcd_volume
checkcc @sosiouxme @rhcarvalho