Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reduce openshift_facts dependencies. #4739

Merged
merged 5 commits into from
Aug 8, 2017

Conversation

kwoodson
Copy link
Contributor

@kwoodson kwoodson commented Jul 11, 2017

The summary of this PR is to remove openshift_facts/tasks and move them to initialize_facts.yml. playbooks/common/openshift-cluster/initilize_facts.yml is included as part of the playbooks/common/openshift-cluster/std_include.yml and ALL entry point playbooks should flow through this playbook so that the following occur:

  • groups are initialized and our oo_ are defined and populated
  • openshift_version is properly determined and set
  • openshift_facts are set on hosts upfront before any other roles/playbooks run.

If you are seeing issues with your playbooks outside of common, please ensure that the playbooks include the playbooks/common/openshift-cluster/std_include.yml playbook.

@kwoodson kwoodson requested a review from mtnbikenc July 11, 2017 20:50
@kwoodson kwoodson self-assigned this Jul 11, 2017

# TODO: Should this role be refactored into health_checks??
- name: Run openshift_sanitize_inventory to set variables
include_role:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you put both roles into roles property and move are tasks under post_tasks? As long as we can use roles instead of include_role, I find the plays more transparent.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ingvagabund, include_role is the future as per our very own cluster lifecycle meeting with the ansible consultant.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will be always opposed to that as long as the original roles property of a play can provide the same functionality. Though, it is my own opinion and not a blocker for the PR :).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ingvagabund, heh, I agree that at first the new style appears different but I can assure you the benefits are there. We discussed this very topic with James from Ansible and he was telling us that when you use include_role the playbook sections pre_tasks, roles, tasks, and post_tasks ordering goes away. The flow of the playbook moves top to bottom and the flexibility increases as you can include only tasks from a role that you desire. With roles: you get the entire role if that is desired or not.

I can find the meeting recording and send it to you. I think its in the architecture notes.


# TODO: Should this be moved into health checks??
# Seems as though any check that happens with a corresponding fail should move into health_checks
- name: Validate python version - ans_dist is fedora and python is v3
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe move both Validate python version - ans_dist is fedora and python is v3 and Validate python version - ans_dist not Fedora and python must be v2 into a new role called init-checks? Given both tasks can be moved up and down, the new role can be added to the roles list.

Copy link
Contributor Author

@kwoodson kwoodson Jul 12, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ingvagabund, this is an interesting idea. I'd prefer to have less role dependencies. If the preflight health_checks are already doing this exact type of checking then I'd prefer if we just did these types of checks inside of the health_checks. Seems like the best place for them.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

health_checks by the name checks health of the cluster. Given it can be anything (with a proper definition of what health is), I agree with you.

# TODO: Should this be moved into health checks??
# Seems as though any check that happens with a corresponding fail should move into health_checks
# Fail as early as possible if Atomic and old version of Docker
- block:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good candidate for the init-checks role.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same as above. Any fail - msg type checks are candidates for health or preflight. I'd prefer to limit the number of dependency calls. The attempt here is to do them upfront in the initial startup and then never do them again. If they go into a role dependency then we have a tendency to include them in mete/main.yml for multiple roles.

We can definitely have the discussion about whether health_checks vs placing these types of checks into a role is superior. Maybe a discussion for the architecture team. I can see an argument for both sides but I'd prefer to have checks in one place so that in the future we know exactly where to place code like this.

@sdodson sdodson requested a review from abutcher July 12, 2017 13:20
@kwoodson kwoodson requested review from ashcrow and tbielawa July 12, 2017 13:31
@kwoodson
Copy link
Contributor Author

Currently failing to start atomic-openshift-node with this error:

 master has not created a default cluster network, network plugin "redhat/openshift-ovs-subnet" can not start

@ashcrow
Copy link
Member

ashcrow commented Jul 12, 2017

I like this move overall. It makes a lot of sense!

@sdodson
Copy link
Member

sdodson commented Jul 12, 2017 via email

@kwoodson
Copy link
Contributor Author

@sdodson, atomic-openshift-sdn-ovs-3.6.74-1.git.0.e6d1637.el7.x86_64

@kwoodson
Copy link
Contributor Author

@sdodson @abutcher pointed me in the right direction. The master-config.yaml was getting a bad value for the CIDR. This was caused by an invalid input parameter from my template.

The install was successful with these changes.

Copy link
Member

@mtnbikenc mtnbikenc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall I'm in favor of this approach. The intent here is to run these initialization tasks once at the beginning of a config or upgrade while allowing the use of the openshift_facts module throughout the rest of the run. Additionally, we are further declaring a common entry point and method for running all standard playbooks. A few changes are requested below.

# TODO: Should this be moved into health checks??
# Seems as though any check that happens with a corresponding fail should move into health_checks
# Fail as early as possible if Atomic and old version of Docker
- block:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is more stylistic, but for blocks my preference is for putting the when: condition at the top of the block. Such as,

- when: 
  - l_is_atomic | bool
  block:

shell: 'CURLY="{"; docker version --format "$CURLY{json .Server.Version}}"'
register: l_atomic_docker_version

- assert:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add a task name.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mtnbikenc, Does assert take a name?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

tested it, will add.

when:
- l_is_atomic | bool

- block:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same comments about when: before block:.

---
required_packages:
- iproute
- python3-dbus
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Need to add back in the need for python3-dbus on Fedora.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added this back in.

- l_is_atomic | bool
- r_openshift_facts_ran is not defined

- name: Load variables
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This task didn't make it over to the new file. Therefore the logic for Fedora is missing and will not install the proper required_packages.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I missed that small discrepancy between fedora and the default.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added back in a the install section. good catch

@kwoodson
Copy link
Contributor Author

aos-ci-test

@openshift-bot
Copy link

success: "aos-ci-jenkins/OS_3.6_NOT_containerized, aos-ci-jenkins/OS_3.6_NOT_containerized_e2e_tests" for 345f5c2 (logs)

@openshift-bot
Copy link

success: "aos-ci-jenkins/OS_3.6_containerized, aos-ci-jenkins/OS_3.6_containerized_e2e_tests" for 345f5c2 (logs)

@openshift-bot
Copy link

success: "aos-ci-jenkins/OS_3.5_NOT_containerized, aos-ci-jenkins/OS_3.5_NOT_containerized_e2e_tests" for 345f5c2 (logs)

@openshift-bot
Copy link

success: "aos-ci-jenkins/OS_3.5_containerized, aos-ci-jenkins/OS_3.5_containerized_e2e_tests" for 345f5c2 (logs)

Copy link
Member

@mtnbikenc mtnbikenc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, just needs a squash.

@kwoodson kwoodson force-pushed the openshift_facts_refactor branch from e1bafb5 to 5f7d9c4 Compare July 17, 2017 19:26
@kwoodson
Copy link
Contributor Author

@mtnbikenc, squashed!

@kwoodson
Copy link
Contributor Author

aos-ci-test

@openshift-bot
Copy link

success: "aos-ci-jenkins/OS_3.6_NOT_containerized, aos-ci-jenkins/OS_3.6_NOT_containerized_e2e_tests" for 5f7d9c4 (logs)

@openshift-bot
Copy link

success: "aos-ci-jenkins/OS_3.5_NOT_containerized, aos-ci-jenkins/OS_3.5_NOT_containerized_e2e_tests" for 5f7d9c4 (logs)

@openshift-bot
Copy link

success: "aos-ci-jenkins/OS_3.6_containerized, aos-ci-jenkins/OS_3.6_containerized_e2e_tests" for 5f7d9c4 (logs)

@openshift-bot
Copy link

success: "aos-ci-jenkins/OS_3.5_containerized, aos-ci-jenkins/OS_3.5_containerized_e2e_tests" for 5f7d9c4 (logs)

@kwoodson kwoodson changed the title [WIP] Reduce openshift_facts dependencies. Reduce openshift_facts dependencies. Jul 18, 2017
@kwoodson
Copy link
Contributor Author

@sdodson, I'd love to get this merged at some point. Any thoughts?

@sdodson
Copy link
Member

sdodson commented Jul 18, 2017

I think given how bad we were in the past with proliferation of calls to openshift_facts we should hold this until after 3.6 is forked so that we get multiple weeks of soak time. We'll merge this as soon as we fork 3.6.

@mtnbikenc
Copy link
Member

Now that 3.6 is branched, can we move forward with merging this?
[test]

@sdodson
Copy link
Member

sdodson commented Aug 7, 2017

[test]

@ashcrow
Copy link
Member

ashcrow commented Aug 7, 2017

The error here seems off. It's failing testing because the rest of the tests have not returned yet.

@abutcher
Copy link
Member

abutcher commented Aug 7, 2017

bot, retest this please

@sdodson
Copy link
Member

sdodson commented Aug 7, 2017

@sosiouxme or @rhcarvalho Can either of you help debug what's going on with these integration tests?

@sosiouxme
Copy link
Member

@sdodson https://ci.openshift.redhat.com/jenkins/job/test_pull_request_openshift_ansible/409/ looks like all the yum interactions failed, see e.g. "Error with yum repository configuration: Cannot find a valid baseurl for repo" - probably a yum bobble at exactly the time this was running?

@kwoodson
Copy link
Contributor Author

kwoodson commented Aug 7, 2017

@sosiouxme, thanks. I'm trying to figure out why this is happening. It could be related to the PR but I find it unlikely. If so, then I need to be able to fix it. Any advice on testing locally or fixing would be great.

@ashcrow
Copy link
Member

ashcrow commented Aug 7, 2017

If it is yum flaking out it's doing it a lot more than I'd expect ☹️. I tend to agree with @kwoodson that the problem isn't something specific to this PR.

@sosiouxme
Copy link
Member

@kwoodson yeah i think that was a flake but when making fairly fundamental changes like this it's good to run the integration tests locally so they don't surprise you at the end of a test or merge.

https://github.com/openshift/openshift-ansible/blob/master/test/integration/README.md describes how to do that; let me know if you get stuck. I can try running against this PR myself just to see if it was a flake... they should only take 5-10 minutes to run locally.

@kwoodson
Copy link
Contributor Author

kwoodson commented Aug 7, 2017

@sosiouxme, I ran the test locally and if failed the same way. I'm not sure where to go from here as my PR didn't touch the repos.

Thanks for helping out.

@ashcrow
Copy link
Member

ashcrow commented Aug 7, 2017

Looking at the errors again it almost reads as if the expected errors are NOT occurring, hence failure.

@sdodson sdodson force-pushed the openshift_facts_refactor branch from 9e59727 to 5494fb3 Compare August 7, 2017 19:53
@sdodson
Copy link
Member

sdodson commented Aug 7, 2017

The commit I've added made things worse. Drop that commit and we're back to openshift undefined which just means the integration playbooks need to be updated to include openshift_facts in a meaningful manner.

@ashcrow
Copy link
Member

ashcrow commented Aug 7, 2017

Example:

		  1. Host:     openshift_ansible_test_72782670407390
		     Play:     Determine openshift_version to configure on first master
		     Task:     openshift_version : fail
		     Message:  Package atomic-openshift not found
--- FAIL: TestPackageUpdateRepoUnreachable (99.15s)
	common.go:54: missing in output: ["check \"package_update\":" "Error getting data from at least one yum repository"]
	common.go:98: 
		$ (cd /data/src/github.com/openshift/openshift-ansible/test/integration/openshift_health_checker/preflight && ansible-playbook -i /dev/null playbooks/package_update_repo_unreachable.yml)
		...
		localhost                  : ok=25   changed=2    unreachable=0    failed=0   
		openshift_ansible_test_32116651946969 : ok=31   changed=5    unreachable=0    failed=1 

which is from:

func TestPackageUpdateRepoUnreachable(t *testing.T) {
        PlaybookTest{
                Path:     "playbooks/package_update_repo_unreachable.yml",
                ExitCode: 2,
                Output: []string{
                        "check \"package_update\":",
                        "Error getting data from at least one yum repository",
                },
        }.Run(t)
}

And the tests use:

// Run runs the PlaybookTest.
func (p PlaybookTest) Run(t *testing.T) {
        // A PlaybookTest is intended to be run in parallel with other tests.
        t.Parallel()

        cmd := exec.Command("ansible-playbook", "-i", "/dev/null", p.Path)
        cmd.Env = append(os.Environ(), "ANSIBLE_FORCE_COLOR=1")
        b, err := cmd.CombinedOutput()

        // Check exit code.
        if (err == nil) && (p.ExitCode != 0) {
                p.checkExitCode(t, 0, p.ExitCode, cmd, b)
        }
        if (err != nil) && (p.ExitCode == 0) {
                got, ok := getExitCode(err)
                if !ok {
                        t.Logf("unexpected error (%T): %[1]v", err)
                        p.logCmdAndOutput(t, cmd, b)
                        t.FailNow()
                }
                p.checkExitCode(t, got, p.ExitCode, cmd, b)
        }

        // Check output contents.
        var missing []string
        for _, s := range p.Output {
                if !bytes.Contains(b, []byte(s)) {
                        missing = append(missing, s)
                }
        }
        if len(missing) > 0 {
                t.Logf("missing in output: %q", missing)
                p.logCmdAndOutput(t, cmd, b)
                t.FailNow()
        }

@ashcrow
Copy link
Member

ashcrow commented Aug 7, 2017

Yeah, this is a test case looking for a failure and not getting the string result it expects.

@sosiouxme
Copy link
Member

The integration tests run actual playbooks. They probably need to be updated to work with this. I'll take a look.

@kwoodson
Copy link
Contributor Author

kwoodson commented Aug 7, 2017

@sosiouxme, this change involves updating the dependencies and how we initialize a playbook run. I have added the std_include.yml to the bottom of the setup_container.yml. This is what is resulting in the error:

	common.go:54: missing in output: ["check \"package_update\":" "Could not perform a yum update." "break-yum-update-1.0-2.noarch requires package-that-does-not-exist"]

@sosiouxme
Copy link
Member

Sorry it's taken me a while to get my head back into this. So, at least locally the integration tests are failing because openshift_version is crapping out before the test can get to what it's testing. Looking through the changes to see why this is...

@sosiouxme
Copy link
Member

The integration tests all start without openshift repos. Those are enabled as needed for the test cases, in their playbooks. So openshift_version needs to happen after that or it will fail. Either update every playbook, or enable a repo in the container setup just to get past the initialization then disable it... I'll try some things.

@kwoodson
Copy link
Contributor Author

kwoodson commented Aug 7, 2017

@sosiouxme, I tracked it down to your previous statement.
On this PR branch:

[ose-3.2]
name=ose-3.2
baseurl=file:///mnt/localrepo/ose-3.2
enabled=0
gpgcheck=0

vs. Master branch

[ose-3.2]
name=ose-3.2
baseurl=file:///mnt/localrepo/ose-3.2
enabled = 1
gpgcheck=0

@sosiouxme
Copy link
Member

I added a commit to handle the integration tests. It actually helped me take out some boilerplate...

@openshift-bot
Copy link

Evaluated for openshift ansible merge up to 8a7f40a

@openshift-bot
Copy link

Evaluated for openshift ansible test up to 8a7f40a

@openshift-bot
Copy link

continuous-integration/openshift-jenkins/test FAILURE (https://ci.openshift.redhat.com/jenkins/job/test_pull_request_openshift_ansible/415/) (Base Commit: 085e3eb) (PR Branch Commit: 8a7f40a)

@openshift-bot
Copy link

continuous-integration/openshift-jenkins/merge FAILURE (https://ci.openshift.redhat.com/jenkins/job/merge_pull_request_openshift_ansible/808/) (Base Commit: 085e3eb) (PR Branch Commit: 8a7f40a)

@sdodson sdodson merged commit 0569c50 into openshift:master Aug 8, 2017
@kwoodson kwoodson deleted the openshift_facts_refactor branch September 18, 2017 14:28
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

9 participants