-
Notifications
You must be signed in to change notification settings - Fork 2.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
openshift_checks: refactor find_ansible_mount #4944
openshift_checks: refactor find_ansible_mount #4944
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking good, clearly removes duplication. Some comments for possible improvements or making sure the intentions are clear, no changes necessary.
@@ -34,7 +46,7 @@ def test_cannot_determine_available_disk(ansible_mounts, extra_words): | |||
with pytest.raises(OpenShiftCheckException) as excinfo: | |||
DiskAvailability(fake_execute_module, task_vars).run() | |||
|
|||
for word in 'determine disk availability'.split() + extra_words: | |||
for word in extra_words: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since you're changing this, we could as well drop the extra_
prefix, and since they are not necessarily "words", perhaps s/extra_words/chunks
, or something that better explains what it is? Naming... 🚲
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure :)
It's a band-aid anyway, the real goal is to base on #4913 and have the tests look for "named" exceptions.
@@ -56,7 +57,7 @@ def test_cannot_determine_available_mountpath(ansible_mounts, extra_words): | |||
with pytest.raises(OpenShiftCheckException) as excinfo: | |||
check.run() | |||
|
|||
for word in 'determine valid etcd mountpath'.split() + extra_words: | |||
for word in ['Unable to determine mount point'] + extra_words: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just to be sure, the original was meant to have just some keywords, and look for them in no particular order, the new code looks for an exact sentence/chunk. The trouble of this kinds of tests is that a change in the message requires a change in the test, making it more maintenance burden. No objections if that was your intention.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see the trouble, and while splitting into words makes the test less sensitive to whitespace changes, it's still the case that changing the wording is likely to require a change to the tests; and at the same time, it makes the test a little less accurate - you could potentially match a similar nearby message that had all the right words. I did find one false positive like this. So, it was my intent.
|
||
mount_point = path | ||
while mount_point not in mount_for_path: | ||
if mount_point in ["/", ""]: # "/" not in ansible_mounts??? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What does the comment mean?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
incredulity that ansible_mounts
would ever come back without /
mapped :) i guess it's not a very illuminating comment.
} | ||
|
||
mount_point = path | ||
while mount_point not in mount_for_path: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
while + condition + if + break == while + new_condition
Let's see...
while mount_point not in mount_for_path and mount_point not in ["/", ""]:
mount_point = os.path.dirname(mount_point)
Simplifying:
while mount_point not in list(mount_for_path.keys()) + ["/", ""]:
mount_point = os.path.dirname(mount_point)
Since the condition is re-evaluated in every iteration, and to make it shorter:
# NOTE: we add '/' and '' because those are the base paths that
# os.path.dirname might return, preventing an infinite loop.
mount_points = set(mount_for_path.keys()) | {'/', ''}
while mount_point not in mount_points:
mount_point = os.path.dirname(mount_point)
Hmm, the single condition still sort of asks for an explanation why we need the extra paths :-)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It does make it read better. Thanks.
|
||
return free_bytes | ||
raise OpenShiftCheckException( | ||
'Unable to retrieve disk availability for "{}" due to a bug.\n' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, as a user, I do not like a software telling me about it's own bugs, specially without any clue as to what the bug is and what can I do about it...
Suggestion: s/due to a bug//, s/Ansible// (do not blame the tool :P)
'Unable to compute disk availability for "{}". Missing key "size_available" ... got "{}".... You can retry running the check, or inspect the output of ansible -m setup HOSTNAME
and make sure Ansible can report... etc.'
The user should understand from the message:
- What happened at a high level: "the disk availability check failed because it was unable to compute disk availability"
- Some indication of why it failed: "we have information about total_size, etc etc for the mount path /foo/bar, but we don't have 'size_available'" -- and we should hope this actually never happens.
- What can the user do next -- hmm, in this case maybe retry, if not there is the ansible -m setup... or report a bug...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree, I would like the user to know what's going on at this level of detail. I get a little lazy when it seems like a really unlikely thing to encounter and complicated to explain accurately if it did happen (because if this is broken, then enough has changed that you probably need to re-evaluate your assumptions), and so stick to simple "something went wrong, and it's not your fault" messages. But you're right, a little further thought can help craft a more helpful error message, both for the user and for the developer.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Communicating the errors is not easy. I did feel I got lengthy above... Oh well, better than perfect in one unlike case is to be consistent :-)
The short and sweet "it is not your fault" style is also good. We're somehow trying to fight the lengthy default Ansible output with something clear and helpful.
In this particular case we agree that things need to be really broken for the error condition to manifest! How about the concise message here, then?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Too late, already spent time expanding it.
|
||
def is_active(self): | ||
etcd_hosts = self.get_var("groups", "etcd", default=[]) or self.get_var("groups", "masters", default=[]) or [] | ||
is_etcd_host = self.get_var("ansible_ssh_host") in etcd_hosts | ||
return super(EtcdVolume, self).is_active() and is_etcd_host | ||
|
||
def run(self): | ||
mount_info = self._etcd_mount_info() | ||
mount_info = self.find_ansible_mount(self.etcd_mount_path) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
@@ -12,7 +12,7 @@ class EtcdImageDataSize(OpenShiftCheck): | |||
tags = ["etcd"] | |||
|
|||
def run(self): | |||
etcd_mountpath = self._get_etcd_mountpath(self.get_var("ansible_mounts")) | |||
etcd_mountpath = self.find_ansible_mount("/var/lib/etcd") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
I see this same path in two different checks :)
@@ -15,7 +16,7 @@ def test_cannot_determine_available_disk(ansible_mounts, extra_words): | |||
with pytest.raises(OpenShiftCheckException) as excinfo: | |||
EtcdVolume(fake_execute_module, task_vars).run() | |||
|
|||
for word in 'Unable to find etcd storage mount point'.split() + extra_words: | |||
for word in ['Unable to determine mount point'] + extra_words: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same as for the other test file, note the difference word-by-word x whole sentence. If that's understood, no problem with the change.
aos-ci-test |
aos-ci-test |
bot, retest this |
bot, restest this please |
bot, retest this please |
1 similar comment
bot, retest this please |
Hmm, how do we report f25 flakes like on https://s3.amazonaws.com/aos-ci/ghprb/openshift/openshift-ansible/969447f26fb1dc95fe854298f7121049b4cc3705.0.1502186778686079484/index.html ? |
Not sure we have a way... /cc @jlebon |
bot, retest this please |
[test] |
Reuse the code for finding the ansible_mounts mount for a path.
aos-ci-test |
Evaluated for openshift ansible test up to 3c71d00 |
continuous-integration/openshift-jenkins/test FAILURE (https://ci.openshift.redhat.com/jenkins/job/test_pull_request_openshift_ansible/423/) (Base Commit: 566731d) (PR Branch Commit: 3c71d00) |
^^^ schroedinger tests ... they succeeded and failed at the same time. Until the bot looked at them, then they collapsed to failed state. Really don't know what to make of that. The logs look like nothing went wrong. |
aos-ci-test |
[merge] |
yum flakes openshift/origin#10162 openshift/origin#8571 i imagine the tests aren't worth much in that state, can just wait until it's resolved. |
[merge] again |
Evaluated for openshift ansible merge up to 3c71d00 |
continuous-integration/openshift-jenkins/merge FAILURE (https://ci.openshift.redhat.com/jenkins/job/merge_pull_request_openshift_ansible/834/) (Base Commit: 7c7b91c) (PR Branch Commit: 3c71d00) |
https://ci.openshift.redhat.com/jenkins/job/merge_pull_request_openshift_ansible/834/ appears to have been Jenkins running out of space.
So this test probably still didn't get very far. I don't think there's a flake issue for this one and don't really expect it to be a recurring thing... @sdodson would you mind either re-running the merge or just merging? The aos-ci-tests all passed long ago, just can't seem to get Rosie on board. |
Reuse the code for finding the ansible_mounts mount for a path.