Reduce openshift_facts dependencies. #4739

kwoodson · 2017-07-11T20:50:29Z

The summary of this PR is to remove openshift_facts/tasks and move them to initialize_facts.yml. playbooks/common/openshift-cluster/initilize_facts.yml is included as part of the playbooks/common/openshift-cluster/std_include.yml and ALL entry point playbooks should flow through this playbook so that the following occur:

groups are initialized and our oo_ are defined and populated
openshift_version is properly determined and set
openshift_facts are set on hosts upfront before any other roles/playbooks run.

If you are seeing issues with your playbooks outside of common, please ensure that the playbooks include the playbooks/common/openshift-cluster/std_include.yml playbook.

ingvagabund · 2017-07-12T09:51:18Z

playbooks/common/openshift-cluster/initialize_facts.yml

+
+  # TODO: Should this role be refactored into health_checks??
+  - name: Run openshift_sanitize_inventory to set variables
+    include_role:


Can you put both roles into roles property and move are tasks under post_tasks? As long as we can use roles instead of include_role, I find the plays more transparent.

@ingvagabund, include_role is the future as per our very own cluster lifecycle meeting with the ansible consultant.

I will be always opposed to that as long as the original roles property of a play can provide the same functionality. Though, it is my own opinion and not a blocker for the PR :).

@ingvagabund, heh, I agree that at first the new style appears different but I can assure you the benefits are there. We discussed this very topic with James from Ansible and he was telling us that when you use include_role the playbook sections pre_tasks, roles, tasks, and post_tasks ordering goes away. The flow of the playbook moves top to bottom and the flexibility increases as you can include only tasks from a role that you desire. With roles: you get the entire role if that is desired or not.

I can find the meeting recording and send it to you. I think its in the architecture notes.

ingvagabund · 2017-07-12T09:53:40Z

playbooks/common/openshift-cluster/initialize_facts.yml

+
+  # TODO: Should this be moved into health checks??
+  # Seems as though any check that happens with a corresponding fail should move into health_checks
+  - name: Validate python version - ans_dist is fedora and python is v3


Maybe move both Validate python version - ans_dist is fedora and python is v3 and Validate python version - ans_dist not Fedora and python must be v2 into a new role called init-checks? Given both tasks can be moved up and down, the new role can be added to the roles list.

@ingvagabund, this is an interesting idea. I'd prefer to have less role dependencies. If the preflight health_checks are already doing this exact type of checking then I'd prefer if we just did these types of checks inside of the health_checks. Seems like the best place for them.

health_checks by the name checks health of the cluster. Given it can be anything (with a proper definition of what health is), I agree with you.

ingvagabund · 2017-07-12T09:54:45Z

playbooks/common/openshift-cluster/initialize_facts.yml

+  # TODO: Should this be moved into health checks??
+  # Seems as though any check that happens with a corresponding fail should move into health_checks
+  # Fail as early as possible if Atomic and old version of Docker
+  - block:


Good candidate for the init-checks role.

Same as above. Any fail - msg type checks are candidates for health or preflight. I'd prefer to limit the number of dependency calls. The attempt here is to do them upfront in the initial startup and then never do them again. If they go into a role dependency then we have a tendency to include them in mete/main.yml for multiple roles.

We can definitely have the discussion about whether health_checks vs placing these types of checks into a role is superior. Maybe a discussion for the architecture team. I can see an argument for both sides but I'd prefer to have checks in one place so that in the future we know exactly where to place code like this.

kwoodson · 2017-07-12T13:35:50Z

Currently failing to start atomic-openshift-node with this error:

 master has not created a default cluster network, network plugin "redhat/openshift-ovs-subnet" can not start

ashcrow · 2017-07-12T14:40:47Z

I like this move overall. It makes a lot of sense!

sdodson · 2017-07-12T15:04:12Z

Is atomic-openshift-sdn-ovs being installed?

…

On Wed, Jul 12, 2017 at 9:35 AM, Kenny Woodson ***@***.***> wrote: Currently failing to start atomic-openshift-node with this error: master has not created a default cluster network, network plugin "redhat/openshift-ovs-subnet" can not start — You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub <#4739 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AAC8IQQL1Bw3ieyuXRG47k-C51quTPU3ks5sNMu2gaJpZM4OUy8H> .

kwoodson · 2017-07-12T15:47:42Z

@sdodson, atomic-openshift-sdn-ovs-3.6.74-1.git.0.e6d1637.el7.x86_64

kwoodson · 2017-07-12T16:53:47Z

@sdodson @abutcher pointed me in the right direction. The master-config.yaml was getting a bad value for the CIDR. This was caused by an invalid input parameter from my template.

The install was successful with these changes.

mtnbikenc

Overall I'm in favor of this approach. The intent here is to run these initialization tasks once at the beginning of a config or upgrade while allowing the use of the openshift_facts module throughout the rest of the run. Additionally, we are further declaring a common entry point and method for running all standard playbooks. A few changes are requested below.

mtnbikenc · 2017-07-12T19:06:12Z

playbooks/common/openshift-cluster/initialize_facts.yml

+  # TODO: Should this be moved into health checks??
+  # Seems as though any check that happens with a corresponding fail should move into health_checks
+  # Fail as early as possible if Atomic and old version of Docker
+  - block:


This is more stylistic, but for blocks my preference is for putting the when: condition at the top of the block. Such as,

- when: - l_is_atomic | bool block:

mtnbikenc · 2017-07-12T19:06:26Z

playbooks/common/openshift-cluster/initialize_facts.yml

+        shell: 'CURLY="{"; docker version --format "$CURLY{json .Server.Version}}"'
+        register: l_atomic_docker_version
+
+      - assert:


Add a task name.

@mtnbikenc, Does assert take a name?

tested it, will add.

mtnbikenc · 2017-07-12T19:07:19Z

playbooks/common/openshift-cluster/initialize_facts.yml

+    when:
+    - l_is_atomic | bool
+
+  - block:


Same comments about when: before block:.

mtnbikenc · 2017-07-12T19:18:57Z

roles/openshift_facts/vars/Fedora.yml

---
-required_packages:
-  - iproute
-  - python3-dbus


Need to add back in the need for python3-dbus on Fedora.

I added this back in.

mtnbikenc · 2017-07-12T19:20:03Z

roles/openshift_facts/tasks/main.yml

-  - l_is_atomic | bool
-  - r_openshift_facts_ran is not defined
-
- name: Load variables


This task didn't make it over to the new file. Therefore the logic for Fedora is missing and will not install the proper required_packages.

I missed that small discrepancy between fedora and the default.

added back in a the install section. good catch

kwoodson · 2017-07-14T12:46:18Z

aos-ci-test

openshift-bot · 2017-07-14T13:39:06Z

success: "aos-ci-jenkins/OS_3.6_NOT_containerized, aos-ci-jenkins/OS_3.6_NOT_containerized_e2e_tests" for 345f5c2 (logs)

openshift-bot · 2017-07-14T13:42:52Z

success: "aos-ci-jenkins/OS_3.6_containerized, aos-ci-jenkins/OS_3.6_containerized_e2e_tests" for 345f5c2 (logs)

openshift-bot · 2017-07-14T13:43:52Z

success: "aos-ci-jenkins/OS_3.5_NOT_containerized, aos-ci-jenkins/OS_3.5_NOT_containerized_e2e_tests" for 345f5c2 (logs)

openshift-bot · 2017-07-14T13:45:55Z

success: "aos-ci-jenkins/OS_3.5_containerized, aos-ci-jenkins/OS_3.5_containerized_e2e_tests" for 345f5c2 (logs)

mtnbikenc

LGTM, just needs a squash.

kwoodson · 2017-07-17T19:26:40Z

@mtnbikenc, squashed!

kwoodson · 2017-07-17T19:27:12Z

aos-ci-test

openshift-bot · 2017-07-17T20:09:35Z

success: "aos-ci-jenkins/OS_3.6_NOT_containerized, aos-ci-jenkins/OS_3.6_NOT_containerized_e2e_tests" for 5f7d9c4 (logs)

openshift-bot · 2017-07-17T20:13:11Z

success: "aos-ci-jenkins/OS_3.5_NOT_containerized, aos-ci-jenkins/OS_3.5_NOT_containerized_e2e_tests" for 5f7d9c4 (logs)

openshift-bot · 2017-07-17T20:13:38Z

success: "aos-ci-jenkins/OS_3.6_containerized, aos-ci-jenkins/OS_3.6_containerized_e2e_tests" for 5f7d9c4 (logs)

openshift-bot · 2017-07-17T20:16:53Z

success: "aos-ci-jenkins/OS_3.5_containerized, aos-ci-jenkins/OS_3.5_containerized_e2e_tests" for 5f7d9c4 (logs)

kwoodson · 2017-07-18T13:44:15Z

@sdodson, I'd love to get this merged at some point. Any thoughts?

sdodson · 2017-07-18T13:56:48Z

I think given how bad we were in the past with proliferation of calls to openshift_facts we should hold this until after 3.6 is forked so that we get multiple weeks of soak time. We'll merge this as soon as we fork 3.6.

mtnbikenc · 2017-07-20T16:01:05Z

Now that 3.6 is branched, can we move forward with merging this?
[test]

sdodson · 2017-08-07T14:23:37Z

[test]

ashcrow · 2017-08-07T17:51:58Z

The error here seems off. It's failing testing because the rest of the tests have not returned yet.

abutcher · 2017-08-07T17:54:34Z

bot, retest this please

sdodson · 2017-08-07T19:03:39Z

@sosiouxme or @rhcarvalho Can either of you help debug what's going on with these integration tests?

sosiouxme · 2017-08-07T19:36:42Z

@sdodson https://ci.openshift.redhat.com/jenkins/job/test_pull_request_openshift_ansible/409/ looks like all the yum interactions failed, see e.g. "Error with yum repository configuration: Cannot find a valid baseurl for repo" - probably a yum bobble at exactly the time this was running?

kwoodson · 2017-08-07T19:40:57Z

@sosiouxme, thanks. I'm trying to figure out why this is happening. It could be related to the PR but I find it unlikely. If so, then I need to be able to fix it. Any advice on testing locally or fixing would be great.

ashcrow · 2017-08-07T19:45:10Z

If it is yum flaking out it's doing it a lot more than I'd expect ☹️. I tend to agree with @kwoodson that the problem isn't something specific to this PR.

sosiouxme · 2017-08-07T19:49:16Z

@kwoodson yeah i think that was a flake but when making fairly fundamental changes like this it's good to run the integration tests locally so they don't surprise you at the end of a test or merge.

https://github.com/openshift/openshift-ansible/blob/master/test/integration/README.md describes how to do that; let me know if you get stuck. I can try running against this PR myself just to see if it was a flake... they should only take 5-10 minutes to run locally.

kwoodson · 2017-08-07T19:52:50Z

@sosiouxme, I ran the test locally and if failed the same way. I'm not sure where to go from here as my PR didn't touch the repos.

Thanks for helping out.

ashcrow · 2017-08-07T19:53:07Z

Looking at the errors again it almost reads as if the expected errors are NOT occurring, hence failure.

sdodson · 2017-08-07T19:53:52Z

The commit I've added made things worse. Drop that commit and we're back to openshift undefined which just means the integration playbooks need to be updated to include openshift_facts in a meaningful manner.

ashcrow · 2017-08-07T19:55:59Z

Example:

		  1. Host:     openshift_ansible_test_72782670407390
		     Play:     Determine openshift_version to configure on first master
		     Task:     openshift_version : fail
		     Message:  Package atomic-openshift not found
--- FAIL: TestPackageUpdateRepoUnreachable (99.15s)
	common.go:54: missing in output: ["check \"package_update\":" "Error getting data from at least one yum repository"]
	common.go:98: 
		$ (cd /data/src/github.com/openshift/openshift-ansible/test/integration/openshift_health_checker/preflight && ansible-playbook -i /dev/null playbooks/package_update_repo_unreachable.yml)
		...
		localhost                  : ok=25   changed=2    unreachable=0    failed=0   
		openshift_ansible_test_32116651946969 : ok=31   changed=5    unreachable=0    failed=1

which is from:

func TestPackageUpdateRepoUnreachable(t *testing.T) {
        PlaybookTest{
                Path:     "playbooks/package_update_repo_unreachable.yml",
                ExitCode: 2,
                Output: []string{
                        "check \"package_update\":",
                        "Error getting data from at least one yum repository",
                },
        }.Run(t)
}

And the tests use:

// Run runs the PlaybookTest.
func (p PlaybookTest) Run(t *testing.T) {
        // A PlaybookTest is intended to be run in parallel with other tests.
        t.Parallel()

        cmd := exec.Command("ansible-playbook", "-i", "/dev/null", p.Path)
        cmd.Env = append(os.Environ(), "ANSIBLE_FORCE_COLOR=1")
        b, err := cmd.CombinedOutput()

        // Check exit code.
        if (err == nil) && (p.ExitCode != 0) {
                p.checkExitCode(t, 0, p.ExitCode, cmd, b)
        }
        if (err != nil) && (p.ExitCode == 0) {
                got, ok := getExitCode(err)
                if !ok {
                        t.Logf("unexpected error (%T): %[1]v", err)
                        p.logCmdAndOutput(t, cmd, b)
                        t.FailNow()
                }
                p.checkExitCode(t, got, p.ExitCode, cmd, b)
        }

        // Check output contents.
        var missing []string
        for _, s := range p.Output {
                if !bytes.Contains(b, []byte(s)) {
                        missing = append(missing, s)
                }
        }
        if len(missing) > 0 {
                t.Logf("missing in output: %q", missing)
                p.logCmdAndOutput(t, cmd, b)
                t.FailNow()
        }

ashcrow · 2017-08-07T20:01:32Z

Yeah, this is a test case looking for a failure and not getting the string result it expects.

sosiouxme · 2017-08-07T20:07:22Z

The integration tests run actual playbooks. They probably need to be updated to work with this. I'll take a look.

….yml

kwoodson · 2017-08-07T20:14:34Z

@sosiouxme, this change involves updating the dependencies and how we initialize a playbook run. I have added the std_include.yml to the bottom of the setup_container.yml. This is what is resulting in the error:

	common.go:54: missing in output: ["check \"package_update\":" "Could not perform a yum update." "break-yum-update-1.0-2.noarch requires package-that-does-not-exist"]

sosiouxme · 2017-08-07T20:47:11Z

Sorry it's taken me a while to get my head back into this. So, at least locally the integration tests are failing because openshift_version is crapping out before the test can get to what it's testing. Looking through the changes to see why this is...

sosiouxme · 2017-08-07T21:41:27Z

The integration tests all start without openshift repos. Those are enabled as needed for the test cases, in their playbooks. So openshift_version needs to happen after that or it will fail. Either update every playbook, or enable a repo in the container setup just to get past the initialization then disable it... I'll try some things.

kwoodson · 2017-08-07T21:44:05Z

@sosiouxme, I tracked it down to your previous statement.
On this PR branch:

[ose-3.2]
name=ose-3.2
baseurl=file:///mnt/localrepo/ose-3.2
enabled=0
gpgcheck=0

vs. Master branch

[ose-3.2]
name=ose-3.2
baseurl=file:///mnt/localrepo/ose-3.2
enabled = 1
gpgcheck=0

sosiouxme · 2017-08-08T00:02:04Z

I added a commit to handle the integration tests. It actually helped me take out some boilerplate...

openshift-bot · 2017-08-08T01:39:34Z

Evaluated for openshift ansible merge up to 8a7f40a

openshift-bot · 2017-08-08T01:47:26Z

Evaluated for openshift ansible test up to 8a7f40a

openshift-bot · 2017-08-08T03:55:18Z

continuous-integration/openshift-jenkins/test FAILURE (https://ci.openshift.redhat.com/jenkins/job/test_pull_request_openshift_ansible/415/) (Base Commit: 085e3eb) (PR Branch Commit: 8a7f40a)

openshift-bot · 2017-08-08T04:07:24Z

continuous-integration/openshift-jenkins/merge FAILURE (https://ci.openshift.redhat.com/jenkins/job/merge_pull_request_openshift_ansible/808/) (Base Commit: 085e3eb) (PR Branch Commit: 8a7f40a)

kwoodson requested a review from mtnbikenc July 11, 2017 20:50

kwoodson self-assigned this Jul 11, 2017

ingvagabund reviewed Jul 12, 2017

View reviewed changes

sdodson requested a review from abutcher July 12, 2017 13:20

kwoodson requested review from ashcrow and tbielawa July 12, 2017 13:31

mtnbikenc requested changes Jul 12, 2017

View reviewed changes

mtnbikenc approved these changes Jul 17, 2017

View reviewed changes

kwoodson force-pushed the openshift_facts_refactor branch from e1bafb5 to 5f7d9c4 Compare July 17, 2017 19:26

ashcrow approved these changes Jul 17, 2017

View reviewed changes

kwoodson changed the title ~~[WIP] Reduce openshift_facts dependencies.~~ Reduce openshift_facts dependencies. Jul 18, 2017

sdodson force-pushed the openshift_facts_refactor branch from 9e59727 to 5494fb3 Compare August 7, 2017 19:53

New pattern involves startup and initializing through the std_include…

0c0afed

….yml

integration tests: keep openshift_version happy

7afd385

Fix lint errors

8a7f40a

sdodson merged commit 0569c50 into openshift:master Aug 8, 2017

sosiouxme mentioned this pull request Aug 8, 2017

openshift-checks: have playbooks invoke std_include #5026

Merged

kwoodson deleted the openshift_facts_refactor branch September 18, 2017 14:28

Reduce openshift_facts dependencies. #4739

Reduce openshift_facts dependencies. #4739

Conversation

kwoodson commented Jul 11, 2017 • edited by rhcarvalho Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

kwoodson Jul 12, 2017 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

kwoodson commented Jul 12, 2017

ashcrow commented Jul 12, 2017

sdodson commented Jul 12, 2017 via email

kwoodson commented Jul 12, 2017

kwoodson commented Jul 12, 2017

mtnbikenc left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

kwoodson commented Jul 14, 2017

openshift-bot commented Jul 14, 2017

openshift-bot commented Jul 14, 2017

openshift-bot commented Jul 14, 2017

openshift-bot commented Jul 14, 2017

mtnbikenc left a comment

Choose a reason for hiding this comment

kwoodson commented Jul 17, 2017

kwoodson commented Jul 17, 2017

openshift-bot commented Jul 17, 2017

openshift-bot commented Jul 17, 2017

openshift-bot commented Jul 17, 2017

openshift-bot commented Jul 17, 2017

kwoodson commented Jul 18, 2017

sdodson commented Jul 18, 2017

mtnbikenc commented Jul 20, 2017

sdodson commented Aug 7, 2017

ashcrow commented Aug 7, 2017

abutcher commented Aug 7, 2017

sdodson commented Aug 7, 2017

sosiouxme commented Aug 7, 2017

kwoodson commented Aug 7, 2017

ashcrow commented Aug 7, 2017

sosiouxme commented Aug 7, 2017

kwoodson commented Aug 7, 2017 • edited Loading

ashcrow commented Aug 7, 2017

sdodson commented Aug 7, 2017

ashcrow commented Aug 7, 2017 • edited Loading

ashcrow commented Aug 7, 2017

sosiouxme commented Aug 7, 2017

kwoodson commented Aug 7, 2017

sosiouxme commented Aug 7, 2017

sosiouxme commented Aug 7, 2017

kwoodson commented Aug 7, 2017

sosiouxme commented Aug 8, 2017

openshift-bot commented Aug 8, 2017

openshift-bot commented Aug 8, 2017

openshift-bot commented Aug 8, 2017

openshift-bot commented Aug 8, 2017

kwoodson commented Jul 11, 2017 •

edited by rhcarvalho

Loading

kwoodson Jul 12, 2017 •

edited

Loading

kwoodson commented Aug 7, 2017 •

edited

Loading

ashcrow commented Aug 7, 2017 •

edited

Loading