-
Notifications
You must be signed in to change notification settings - Fork 2.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Reduce openshift_facts dependencies. #4739
Conversation
|
||
# TODO: Should this role be refactored into health_checks?? | ||
- name: Run openshift_sanitize_inventory to set variables | ||
include_role: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you put both roles into roles
property and move are tasks under post_tasks
? As long as we can use roles
instead of include_role
, I find the plays more transparent.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ingvagabund, include_role
is the future as per our very own cluster lifecycle meeting with the ansible consultant.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I will be always opposed to that as long as the original roles
property of a play can provide the same functionality. Though, it is my own opinion and not a blocker for the PR :).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ingvagabund, heh, I agree that at first the new style appears different but I can assure you the benefits are there. We discussed this very topic with James from Ansible and he was telling us that when you use include_role
the playbook sections pre_tasks
, roles
, tasks
, and post_tasks
ordering goes away. The flow of the playbook moves top to bottom and the flexibility increases as you can include only tasks from a role that you desire. With roles:
you get the entire role if that is desired or not.
I can find the meeting recording and send it to you. I think its in the architecture notes.
|
||
# TODO: Should this be moved into health checks?? | ||
# Seems as though any check that happens with a corresponding fail should move into health_checks | ||
- name: Validate python version - ans_dist is fedora and python is v3 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe move both Validate python version - ans_dist is fedora and python is v3
and Validate python version - ans_dist not Fedora and python must be v2
into a new role called init-checks
? Given both tasks can be moved up and down, the new role can be added to the roles
list.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ingvagabund, this is an interesting idea. I'd prefer to have less role dependencies. If the preflight health_checks
are already doing this exact type of checking then I'd prefer if we just did these types of checks inside of the health_checks
. Seems like the best place for them.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
health_checks
by the name checks health of the cluster. Given it can be anything (with a proper definition of what health is), I agree with you.
# TODO: Should this be moved into health checks?? | ||
# Seems as though any check that happens with a corresponding fail should move into health_checks | ||
# Fail as early as possible if Atomic and old version of Docker | ||
- block: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good candidate for the init-checks
role.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same as above. Any fail - msg
type checks are candidates for health
or preflight
. I'd prefer to limit the number of dependency calls. The attempt here is to do them upfront in the initial startup and then never do them again. If they go into a role dependency then we have a tendency to include them in mete/main.yml
for multiple roles.
We can definitely have the discussion about whether health_checks
vs placing these types of checks into a role
is superior. Maybe a discussion for the architecture team. I can see an argument for both sides but I'd prefer to have checks in one place so that in the future we know exactly where to place code like this.
Currently failing to start atomic-openshift-node with this error:
|
I like this move overall. It makes a lot of sense! |
Is atomic-openshift-sdn-ovs being installed?
…On Wed, Jul 12, 2017 at 9:35 AM, Kenny Woodson ***@***.***> wrote:
Currently failing to start atomic-openshift-node with this error:
master has not created a default cluster network, network plugin "redhat/openshift-ovs-subnet" can not start
—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
<#4739 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AAC8IQQL1Bw3ieyuXRG47k-C51quTPU3ks5sNMu2gaJpZM4OUy8H>
.
|
@sdodson, atomic-openshift-sdn-ovs-3.6.74-1.git.0.e6d1637.el7.x86_64 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall I'm in favor of this approach. The intent here is to run these initialization tasks once at the beginning of a config or upgrade while allowing the use of the openshift_facts module throughout the rest of the run. Additionally, we are further declaring a common entry point and method for running all standard playbooks. A few changes are requested below.
# TODO: Should this be moved into health checks?? | ||
# Seems as though any check that happens with a corresponding fail should move into health_checks | ||
# Fail as early as possible if Atomic and old version of Docker | ||
- block: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is more stylistic, but for blocks my preference is for putting the when:
condition at the top of the block. Such as,
- when:
- l_is_atomic | bool
block:
shell: 'CURLY="{"; docker version --format "$CURLY{json .Server.Version}}"' | ||
register: l_atomic_docker_version | ||
|
||
- assert: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add a task name.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@mtnbikenc, Does assert take a name?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
tested it, will add.
when: | ||
- l_is_atomic | bool | ||
|
||
- block: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same comments about when:
before block:
.
--- | ||
required_packages: | ||
- iproute | ||
- python3-dbus |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Need to add back in the need for python3-dbus on Fedora.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added this back in.
roles/openshift_facts/tasks/main.yml
Outdated
- l_is_atomic | bool | ||
- r_openshift_facts_ran is not defined | ||
|
||
- name: Load variables |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This task didn't make it over to the new file. Therefore the logic for Fedora is missing and will not install the proper required_packages
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I missed that small discrepancy between fedora and the default.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
added back in a the install section. good catch
aos-ci-test |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, just needs a squash.
e1bafb5
to
5f7d9c4
Compare
@mtnbikenc, squashed! |
aos-ci-test |
@sdodson, I'd love to get this merged at some point. Any thoughts? |
I think given how bad we were in the past with proliferation of calls to openshift_facts we should hold this until after 3.6 is forked so that we get multiple weeks of soak time. We'll merge this as soon as we fork 3.6. |
Now that 3.6 is branched, can we move forward with merging this? |
[test] |
The error here seems off. It's failing testing because the rest of the tests have not returned yet. |
bot, retest this please |
@sosiouxme or @rhcarvalho Can either of you help debug what's going on with these integration tests? |
@sdodson https://ci.openshift.redhat.com/jenkins/job/test_pull_request_openshift_ansible/409/ looks like all the yum interactions failed, see e.g. "Error with yum repository configuration: Cannot find a valid baseurl for repo" - probably a yum bobble at exactly the time this was running? |
@sosiouxme, thanks. I'm trying to figure out why this is happening. It could be related to the PR but I find it unlikely. If so, then I need to be able to fix it. Any advice on testing locally or fixing would be great. |
If it is yum flaking out it's doing it a lot more than I'd expect |
@kwoodson yeah i think that was a flake but when making fairly fundamental changes like this it's good to run the integration tests locally so they don't surprise you at the end of a test or merge. https://github.com/openshift/openshift-ansible/blob/master/test/integration/README.md describes how to do that; let me know if you get stuck. I can try running against this PR myself just to see if it was a flake... they should only take 5-10 minutes to run locally. |
@sosiouxme, I ran the test locally and if failed the same way. I'm not sure where to go from here as my PR didn't touch the repos. Thanks for helping out. |
Looking at the errors again it almost reads as if the expected errors are NOT occurring, hence failure. |
9e59727
to
5494fb3
Compare
The commit I've added made things worse. Drop that commit and we're back to |
Example:
which is from: func TestPackageUpdateRepoUnreachable(t *testing.T) {
PlaybookTest{
Path: "playbooks/package_update_repo_unreachable.yml",
ExitCode: 2,
Output: []string{
"check \"package_update\":",
"Error getting data from at least one yum repository",
},
}.Run(t)
} And the tests use: // Run runs the PlaybookTest.
func (p PlaybookTest) Run(t *testing.T) {
// A PlaybookTest is intended to be run in parallel with other tests.
t.Parallel()
cmd := exec.Command("ansible-playbook", "-i", "/dev/null", p.Path)
cmd.Env = append(os.Environ(), "ANSIBLE_FORCE_COLOR=1")
b, err := cmd.CombinedOutput()
// Check exit code.
if (err == nil) && (p.ExitCode != 0) {
p.checkExitCode(t, 0, p.ExitCode, cmd, b)
}
if (err != nil) && (p.ExitCode == 0) {
got, ok := getExitCode(err)
if !ok {
t.Logf("unexpected error (%T): %[1]v", err)
p.logCmdAndOutput(t, cmd, b)
t.FailNow()
}
p.checkExitCode(t, got, p.ExitCode, cmd, b)
}
// Check output contents.
var missing []string
for _, s := range p.Output {
if !bytes.Contains(b, []byte(s)) {
missing = append(missing, s)
}
}
if len(missing) > 0 {
t.Logf("missing in output: %q", missing)
p.logCmdAndOutput(t, cmd, b)
t.FailNow()
} |
Yeah, this is a test case looking for a failure and not getting the string result it expects. |
The integration tests run actual playbooks. They probably need to be updated to work with this. I'll take a look. |
@sosiouxme, this change involves updating the dependencies and how we initialize a playbook run. I have added the
|
Sorry it's taken me a while to get my head back into this. So, at least locally the integration tests are failing because |
The integration tests all start without openshift repos. Those are enabled as needed for the test cases, in their playbooks. So |
@sosiouxme, I tracked it down to your previous statement.
vs. Master branch
|
I added a commit to handle the integration tests. It actually helped me take out some boilerplate... |
Evaluated for openshift ansible merge up to 8a7f40a |
Evaluated for openshift ansible test up to 8a7f40a |
continuous-integration/openshift-jenkins/test FAILURE (https://ci.openshift.redhat.com/jenkins/job/test_pull_request_openshift_ansible/415/) (Base Commit: 085e3eb) (PR Branch Commit: 8a7f40a) |
continuous-integration/openshift-jenkins/merge FAILURE (https://ci.openshift.redhat.com/jenkins/job/merge_pull_request_openshift_ansible/808/) (Base Commit: 085e3eb) (PR Branch Commit: 8a7f40a) |
The summary of this PR is to remove
openshift_facts/tasks
and move them toinitialize_facts.yml
.playbooks/common/openshift-cluster/initilize_facts.yml
is included as part of theplaybooks/common/openshift-cluster/std_include.yml
and ALL entry point playbooks should flow through this playbook so that the following occur:If you are seeing issues with your playbooks outside of common, please ensure that the playbooks include the
playbooks/common/openshift-cluster/std_include.yml
playbook.