openshift_checks: refactor find_ansible_mount #4944

sosiouxme · 2017-07-29T15:06:05Z

Reuse the code for finding the ansible_mounts mount for a path.

rhcarvalho

Looking good, clearly removes duplication. Some comments for possible improvements or making sure the intentions are clear, no changes necessary.

rhcarvalho · 2017-07-31T13:30:41Z

roles/openshift_health_checker/test/disk_availability_test.py

@@ -34,7 +46,7 @@ def test_cannot_determine_available_disk(ansible_mounts, extra_words):
    with pytest.raises(OpenShiftCheckException) as excinfo:
        DiskAvailability(fake_execute_module, task_vars).run()

-    for word in 'determine disk availability'.split() + extra_words:
+    for word in extra_words:


Since you're changing this, we could as well drop the extra_ prefix, and since they are not necessarily "words", perhaps s/extra_words/chunks, or something that better explains what it is? Naming... 🚲

Sure :)

It's a band-aid anyway, the real goal is to base on #4913 and have the tests look for "named" exceptions.

rhcarvalho · 2017-07-31T13:33:54Z

roles/openshift_health_checker/test/etcd_imagedata_size_test.py

@@ -56,7 +57,7 @@ def test_cannot_determine_available_mountpath(ansible_mounts, extra_words):
    with pytest.raises(OpenShiftCheckException) as excinfo:
        check.run()

-    for word in 'determine valid etcd mountpath'.split() + extra_words:
+    for word in ['Unable to determine mount point'] + extra_words:


Just to be sure, the original was meant to have just some keywords, and look for them in no particular order, the new code looks for an exact sentence/chunk. The trouble of this kinds of tests is that a change in the message requires a change in the test, making it more maintenance burden. No objections if that was your intention.

I see the trouble, and while splitting into words makes the test less sensitive to whitespace changes, it's still the case that changing the wording is likely to require a change to the tests; and at the same time, it makes the test a little less accurate - you could potentially match a similar nearby message that had all the right words. I did find one false positive like this. So, it was my intent.

rhcarvalho · 2017-08-01T07:59:37Z

roles/openshift_health_checker/openshift_checks/__init__.py

+
+        mount_point = path
+        while mount_point not in mount_for_path:
+            if mount_point in ["/", ""]:  # "/" not in ansible_mounts???


What does the comment mean?

incredulity that ansible_mounts would ever come back without / mapped :) i guess it's not a very illuminating comment.

rhcarvalho · 2017-08-01T08:31:51Z

roles/openshift_health_checker/openshift_checks/__init__.py

+        }
+
+        mount_point = path
+        while mount_point not in mount_for_path:


while + condition + if + break == while + new_condition

Let's see...

while mount_point not in mount_for_path and mount_point not in ["/", ""]: mount_point = os.path.dirname(mount_point)

Simplifying:

while mount_point not in list(mount_for_path.keys()) + ["/", ""]: mount_point = os.path.dirname(mount_point)

Since the condition is re-evaluated in every iteration, and to make it shorter:

# NOTE: we add '/' and '' because those are the base paths that # os.path.dirname might return, preventing an infinite loop. mount_points = set(mount_for_path.keys()) | {'/', ''} while mount_point not in mount_points: mount_point = os.path.dirname(mount_point)

Hmm, the single condition still sort of asks for an explanation why we need the extra paths :-)

It does make it read better. Thanks.

rhcarvalho · 2017-08-01T08:44:08Z

roles/openshift_health_checker/openshift_checks/disk_availability.py

-
-        return free_bytes
+            raise OpenShiftCheckException(
+                'Unable to retrieve disk availability for "{}" due to a bug.\n'


Hmm, as a user, I do not like a software telling me about it's own bugs, specially without any clue as to what the bug is and what can I do about it...
Suggestion: s/due to a bug//, s/Ansible// (do not blame the tool :P)

'Unable to compute disk availability for "{}". Missing key "size_available" ... got "{}".... You can retry running the check, or inspect the output of ansible -m setup HOSTNAME and make sure Ansible can report... etc.'

The user should understand from the message:

What happened at a high level: "the disk availability check failed because it was unable to compute disk availability"

Some indication of why it failed: "we have information about total_size, etc etc for the mount path /foo/bar, but we don't have 'size_available'" -- and we should hope this actually never happens.

What can the user do next -- hmm, in this case maybe retry, if not there is the ansible -m setup... or report a bug...

I agree, I would like the user to know what's going on at this level of detail. I get a little lazy when it seems like a really unlikely thing to encounter and complicated to explain accurately if it did happen (because if this is broken, then enough has changed that you probably need to re-evaluate your assumptions), and so stick to simple "something went wrong, and it's not your fault" messages. But you're right, a little further thought can help craft a more helpful error message, both for the user and for the developer.

Communicating the errors is not easy. I did feel I got lengthy above... Oh well, better than perfect in one unlike case is to be consistent :-)
The short and sweet "it is not your fault" style is also good. We're somehow trying to fight the lengthy default Ansible output with something clear and helpful.

In this particular case we agree that things need to be really broken for the error condition to manifest! How about the concise message here, then?

Too late, already spent time expanding it.

rhcarvalho · 2017-08-01T08:49:13Z

roles/openshift_health_checker/openshift_checks/etcd_volume.py


    def is_active(self):
        etcd_hosts = self.get_var("groups", "etcd", default=[]) or self.get_var("groups", "masters", default=[]) or []
        is_etcd_host = self.get_var("ansible_ssh_host") in etcd_hosts
        return super(EtcdVolume, self).is_active() and is_etcd_host

    def run(self):
-        mount_info = self._etcd_mount_info()
+        mount_info = self.find_ansible_mount(self.etcd_mount_path)


rhcarvalho · 2017-08-01T08:49:47Z

roles/openshift_health_checker/openshift_checks/etcd_imagedata_size.py

@@ -12,7 +12,7 @@ class EtcdImageDataSize(OpenShiftCheck):
    tags = ["etcd"]

    def run(self):
-        etcd_mountpath = self._get_etcd_mountpath(self.get_var("ansible_mounts"))
+        etcd_mountpath = self.find_ansible_mount("/var/lib/etcd")


👍

I see this same path in two different checks :)

rhcarvalho · 2017-08-01T08:51:20Z

roles/openshift_health_checker/test/etcd_volume_test.py

@@ -15,7 +16,7 @@ def test_cannot_determine_available_disk(ansible_mounts, extra_words):
    with pytest.raises(OpenShiftCheckException) as excinfo:
        EtcdVolume(fake_execute_module, task_vars).run()

-    for word in 'Unable to find etcd storage mount point'.split() + extra_words:
+    for word in ['Unable to determine mount point'] + extra_words:


Same as for the other test file, note the difference word-by-word x whole sentence. If that's understood, no problem with the change.

sosiouxme · 2017-08-01T14:56:11Z

aos-ci-test

openshift-bot · 2017-08-01T16:57:17Z

success: "aos-ci-jenkins/OS_3.6_NOT_containerized, aos-ci-jenkins/OS_3.6_NOT_containerized_e2e_tests" for 80ad75a (logs)

openshift-bot · 2017-08-01T17:00:46Z

success: "aos-ci-jenkins/OS_3.6_containerized, aos-ci-jenkins/OS_3.6_containerized_e2e_tests" for 80ad75a (logs)

sosiouxme · 2017-08-07T18:16:34Z

aos-ci-test

openshift-bot · 2017-08-07T19:13:02Z

success: "aos-ci-jenkins/OS_3.6_NOT_containerized, aos-ci-jenkins/OS_3.6_NOT_containerized_e2e_tests" for 969447f (logs)

openshift-bot · 2017-08-07T19:15:38Z

success: "aos-ci-jenkins/OS_3.6_containerized, aos-ci-jenkins/OS_3.6_containerized_e2e_tests" for 969447f (logs)

sosiouxme · 2017-08-07T19:27:09Z

bot, retest this

sosiouxme · 2017-08-07T20:04:35Z

bot, restest this please

rhcarvalho · 2017-08-08T10:05:42Z

bot, retest this please

sosiouxme · 2017-08-08T12:51:23Z

bot, retest this please

sosiouxme · 2017-08-08T12:52:32Z

Hmm, how do we report f25 flakes like on https://s3.amazonaws.com/aos-ci/ghprb/openshift/openshift-ansible/969447f26fb1dc95fe854298f7121049b4cc3705.0.1502186778686079484/index.html ?

rhcarvalho · 2017-08-08T14:03:37Z

Hmm, how do we report f25 flakes like on https://s3.amazonaws.com/aos-ci/ghprb/openshift/openshift-ansible/969447f26fb1dc95fe854298f7121049b4cc3705.0.1502186778686079484/index.html ?

Not sure we have a way... /cc @jlebon

sosiouxme · 2017-08-08T15:06:33Z

bot, retest this please

rhcarvalho · 2017-08-08T15:20:56Z

[test]

Reuse the code for finding the ansible_mounts mount for a path.

sosiouxme · 2017-08-08T19:43:18Z

aos-ci-test

openshift-bot · 2017-08-08T19:47:45Z

Evaluated for openshift ansible test up to 3c71d00

openshift-bot · 2017-08-08T21:15:33Z

continuous-integration/openshift-jenkins/test FAILURE (https://ci.openshift.redhat.com/jenkins/job/test_pull_request_openshift_ansible/423/) (Base Commit: 566731d) (PR Branch Commit: 3c71d00)

openshift-bot · 2017-08-08T23:41:38Z

success: "aos-ci-jenkins/OS_3.6_NOT_containerized, aos-ci-jenkins/OS_3.6_NOT_containerized_e2e_tests" for 3c71d00 (logs)

openshift-bot · 2017-08-08T23:43:23Z

success: "aos-ci-jenkins/OS_3.6_containerized, aos-ci-jenkins/OS_3.6_containerized_e2e_tests" for 3c71d00 (logs)

openshift-bot · 2017-08-08T23:44:24Z

error: "aos-ci-jenkins/OS_3.6_containerized, aos-ci-jenkins/OS_3.6_containerized_e2e_tests" for 3c71d00 (logs)

openshift-bot · 2017-08-08T23:44:25Z

error: "aos-ci-jenkins/OS_3.6_NOT_containerized, aos-ci-jenkins/OS_3.6_NOT_containerized_e2e_tests" for 3c71d00 (logs)

sosiouxme · 2017-08-09T01:05:35Z

^^^ schroedinger tests ... they succeeded and failed at the same time. Until the bot looked at them, then they collapsed to failed state.

Really don't know what to make of that. The logs look like nothing went wrong.

sosiouxme · 2017-08-09T01:05:43Z

aos-ci-test

openshift-bot · 2017-08-09T02:33:14Z

success: "aos-ci-jenkins/OS_3.6_NOT_containerized, aos-ci-jenkins/OS_3.6_NOT_containerized_e2e_tests" for 3c71d00 (logs)

openshift-bot · 2017-08-09T02:34:50Z

success: "aos-ci-jenkins/OS_3.6_containerized, aos-ci-jenkins/OS_3.6_containerized_e2e_tests" for 3c71d00 (logs)

sosiouxme · 2017-08-09T12:05:21Z

[merge]

sosiouxme · 2017-08-09T20:04:05Z

yum flakes openshift/origin#10162 openshift/origin#8571
epel mirror is borked?

i imagine the tests aren't worth much in that state, can just wait until it's resolved.

sosiouxme · 2017-08-10T18:00:02Z

[merge] again

openshift-bot · 2017-08-10T18:07:33Z

Evaluated for openshift ansible merge up to 3c71d00

openshift-bot · 2017-08-11T03:31:30Z

continuous-integration/openshift-jenkins/merge FAILURE (https://ci.openshift.redhat.com/jenkins/job/merge_pull_request_openshift_ansible/834/) (Base Commit: 7c7b91c) (PR Branch Commit: 3c71d00)

sosiouxme · 2017-08-11T17:54:25Z

https://ci.openshift.redhat.com/jenkins/job/merge_pull_request_openshift_ansible/834/ appears to have been Jenkins running out of space.

For example

FATAL: Unable to produce a script file
java.io.IOException: No space left on device
	at java.io.UnixFileSystem.createFileExclusively(Native Method)
	at java.io.File.createTempFile(File.java:2024)
	at hudson.FilePath$17.invoke(FilePath.java:1372)
Caused: java.io.IOException: Failed to create a temporary directory in /tmp
	at hudson.FilePath$17.invoke(FilePath.java:1374)
	at hudson.FilePath$17.invoke(FilePath.java:1362)
	at hudson.FilePath.act(FilePath.java:996)
	at hudson.FilePath.act(FilePath.java:974)
	at hudson.FilePath.createTextTempFile(FilePath.java:1362)
Caused: java.io.IOException: Failed to create a temp file on /var/lib/jenkins/jobs/test_pull_request_openshift_ansible_extended_conformance_install_with_status_check/workspace

So this test probably still didn't get very far. I don't think there's a flake issue for this one and don't really expect it to be a recurring thing... @sdodson would you mind either re-running the merge or just merging? The aos-ci-tests all passed long ago, just can't seem to get Rosie on board.

sosiouxme requested review from rhcarvalho and juanvallejo July 29, 2017 15:06

rhcarvalho approved these changes Aug 1, 2017

View reviewed changes

juanvallejo approved these changes Aug 1, 2017

View reviewed changes

openshift-bot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Aug 7, 2017

openshift-bot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Aug 7, 2017

openshift-bot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Aug 8, 2017

openshift_checks: refactor find_ansible_mount

3c71d00

Reuse the code for finding the ansible_mounts mount for a path.

openshift-bot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Aug 8, 2017

sdodson merged commit 8daf3d9 into openshift:master Aug 11, 2017

sosiouxme deleted the 20170728-refactor-ansible-mounts branch August 15, 2017 22:19

openshift_checks: refactor find_ansible_mount #4944

openshift_checks: refactor find_ansible_mount #4944

Conversation

sosiouxme commented Jul 29, 2017

rhcarvalho left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

sosiouxme commented Aug 1, 2017

openshift-bot commented Aug 1, 2017

openshift-bot commented Aug 1, 2017

sosiouxme commented Aug 7, 2017

openshift-bot commented Aug 7, 2017

openshift-bot commented Aug 7, 2017

sosiouxme commented Aug 7, 2017

sosiouxme commented Aug 7, 2017

rhcarvalho commented Aug 8, 2017

sosiouxme commented Aug 8, 2017

sosiouxme commented Aug 8, 2017

rhcarvalho commented Aug 8, 2017

sosiouxme commented Aug 8, 2017

rhcarvalho commented Aug 8, 2017

sosiouxme commented Aug 8, 2017

openshift-bot commented Aug 8, 2017

openshift-bot commented Aug 8, 2017

openshift-bot commented Aug 8, 2017

openshift-bot commented Aug 8, 2017

openshift-bot commented Aug 8, 2017

openshift-bot commented Aug 8, 2017

sosiouxme commented Aug 9, 2017 • edited Loading

sosiouxme commented Aug 9, 2017

openshift-bot commented Aug 9, 2017

openshift-bot commented Aug 9, 2017

sosiouxme commented Aug 9, 2017

sosiouxme commented Aug 9, 2017

sosiouxme commented Aug 10, 2017

openshift-bot commented Aug 10, 2017

openshift-bot commented Aug 11, 2017

sosiouxme commented Aug 11, 2017

sosiouxme commented Aug 9, 2017 •

edited

Loading