diff --git a/.dockerignore b/.dockerignore index 968811df5b8..0a70c5bfa5a 100644 --- a/.dockerignore +++ b/.dockerignore @@ -1,8 +1,12 @@ .* bin docs +hack +inventory test utils **/*.md *.spec +*.ini +*.txt setup* diff --git a/BUILD.md b/BUILD.md index d9de3f5aafa..1c270db234e 100644 --- a/BUILD.md +++ b/BUILD.md @@ -33,35 +33,6 @@ To build a container image of `openshift-ansible` using standalone **Docker**: cd openshift-ansible docker build -f images/installer/Dockerfile -t openshift-ansible . -### Building on OpenShift - -To build an openshift-ansible image using an **OpenShift** [build and image stream](https://docs.openshift.org/latest/architecture/core_concepts/builds_and_image_streams.html) the straightforward command would be: - - oc new-build registry.centos.org/openshift/playbook2image~https://github.com/openshift/openshift-ansible - -However: because the `Dockerfile` for this repository is not in the top level directory, and because we can't change the build context to the `images/installer` path as it would cause the build to fail, the `oc new-app` command above will create a build configuration using the *source to image* strategy, which is the default approach of the [playbook2image](https://github.com/openshift/playbook2image) base image. This does build an image successfully, but unfortunately the resulting image will be missing some customizations that are handled by the [Dockerfile](images/installer/Dockerfile) in this repo. - -At the time of this writing there is no straightforward option to [set the dockerfilePath](https://docs.openshift.org/latest/dev_guide/builds/build_strategies.html#dockerfile-path) of a `docker` build strategy with `oc new-build`. The alternatives to achieve this are: - -- Use the simple `oc new-build` command above to generate the BuildConfig and ImageStream objects, and then manually edit the generated build configuration to change its strategy to `dockerStrategy` and set `dockerfilePath` to `images/installer/Dockerfile`. - -- Download and pass the `Dockerfile` to `oc new-build` with the `-D` option: - -``` -curl -s https://raw.githubusercontent.com/openshift/openshift-ansible/master/images/installer/Dockerfile | - oc new-build -D - \ - --docker-image=registry.centos.org/openshift/playbook2image \ - https://github.com/openshift/openshift-ansible -``` - -Once a build is started, the progress of the build can be monitored with: - - oc logs -f bc/openshift-ansible - -Once built, the image will be visible in the Image Stream created by `oc new-app`: - - oc describe imagestream openshift-ansible - ## Build the Atomic System Container A system container runs using runC instead of Docker and it is managed diff --git a/README_CONTAINER_IMAGE.md b/README_CONTAINER_IMAGE.md index cf3b432df5a..a2151352de6 100644 --- a/README_CONTAINER_IMAGE.md +++ b/README_CONTAINER_IMAGE.md @@ -1,6 +1,6 @@ # Containerized openshift-ansible to run playbooks -The [Dockerfile](images/installer/Dockerfile) in this repository uses the [playbook2image](https://github.com/openshift/playbook2image) source-to-image base image to containerize `openshift-ansible`. The resulting image can run any of the provided playbooks. See [BUILD.md](BUILD.md) for image build instructions. +The [Dockerfile](images/installer/Dockerfile) in this repository can be used to build a containerized `openshift-ansible`. The resulting image can run any of the provided playbooks. See [BUILD.md](BUILD.md) for image build instructions. The image is designed to **run as a non-root user**. The container's UID is mapped to the username `default` at runtime. Therefore, the container's environment reflects that user's settings, and the configuration should match that. For example `$HOME` is `/opt/app-root/src`, so ssh keys are expected to be under `/opt/app-root/src/.ssh`. If you ran a container as `root` you would have to adjust the container's configuration accordingly, e.g. by placing ssh keys under `/root/.ssh` instead. Nevertheless, the expectation is that containers will be run as non-root; for example, this container image can be run inside OpenShift under the default `restricted` [security context constraint](https://docs.openshift.org/latest/architecture/additional_concepts/authorization.html#security-context-constraints). @@ -14,8 +14,6 @@ This provides consistency with other images used by the platform and it's also a ## Usage -The `playbook2image` base image provides several options to control the behaviour of the containers. For more details on these options see the [playbook2image](https://github.com/openshift/playbook2image) documentation. - At the very least, when running a container you must specify: 1. An **inventory**. This can be a location inside the container (possibly mounted as a volume) with a path referenced via the `INVENTORY_FILE` environment variable. Alternatively you can serve the inventory file from a web server and use the `INVENTORY_URL` environment variable to fetch it, or `DYNAMIC_SCRIPT_URL` to download a script that provides a dynamic inventory. @@ -52,8 +50,6 @@ Here is a detailed explanation of the options used in the command above: Further usage examples are available in the [examples directory](examples/) with samples of how to use the image from within OpenShift. -Additional usage information for images built from `playbook2image` like this one can be found in the [playbook2image examples](https://github.com/openshift/playbook2image/tree/master/examples). - ## Running openshift-ansible as a System Container Building the System Container: See the [BUILD.md](BUILD.md). diff --git a/images/installer/Dockerfile b/images/installer/Dockerfile index 915dfe37763..0d977d48f64 100644 --- a/images/installer/Dockerfile +++ b/images/installer/Dockerfile @@ -1,10 +1,21 @@ -# Using playbook2image as a base -# See https://github.com/openshift/playbook2image for details on the image -# including documentation for the settings/env vars referenced below -FROM registry.centos.org/openshift/playbook2image:latest +FROM centos:7 MAINTAINER OpenShift Team +USER root + +# Add origin repo for including the oc client +COPY images/installer/origin-extra-root / + +# install ansible and deps +RUN INSTALL_PKGS="python-lxml pyOpenSSL python2-cryptography openssl java-1.8.0-openjdk-headless python2-passlib httpd-tools openssh-clients origin-clients" \ + && yum install -y --setopt=tsflags=nodocs $INSTALL_PKGS \ + && EPEL_PKGS="ansible python2-boto" \ + && yum install -y epel-release \ + && yum install -y --setopt=tsflags=nodocs $EPEL_PKGS \ + && rpm -V $INSTALL_PKGS $EPEL_PKGS \ + && yum clean all + LABEL name="openshift/origin-ansible" \ summary="OpenShift's installation and configuration tool" \ description="A containerized openshift-ansible image to let you run playbooks to install, upgrade, maintain and check an OpenShift cluster" \ @@ -12,40 +23,24 @@ LABEL name="openshift/origin-ansible" \ io.k8s.display-name="openshift-ansible" \ io.k8s.description="A containerized openshift-ansible image to let you run playbooks to install, upgrade, maintain and check an OpenShift cluster" \ io.openshift.expose-services="" \ - io.openshift.tags="openshift,install,upgrade,ansible" + io.openshift.tags="openshift,install,upgrade,ansible" \ + atomic.run="once" -USER root +ENV USER_UID=1001 \ + HOME=/opt/app-root/src \ + WORK_DIR=/usr/share/ansible/openshift-ansible \ + OPTS="-v" -# Create a symlink to /opt/app-root/src so that files under /usr/share/ansible are accessible. -# This is required since the system-container uses by default the playbook under -# /usr/share/ansible/openshift-ansible. With this change we won't need to keep two different -# configurations for the two images. -RUN mkdir -p /usr/share/ansible/ && ln -s /opt/app-root/src /usr/share/ansible/openshift-ansible +# Add image scripts and files for running as a system container +COPY images/installer/root / +# Include playbooks, roles, plugins, etc. from this repo +COPY . ${WORK_DIR} -RUN INSTALL_PKGS="skopeo openssl java-1.8.0-openjdk-headless httpd-tools" && \ - yum install -y --setopt=tsflags=nodocs $INSTALL_PKGS && \ - rpm -V $INSTALL_PKGS && \ - yum clean all +RUN /usr/local/bin/user_setup \ + && rm /usr/local/bin/usage.ocp USER ${USER_UID} -# The playbook to be run is specified via the PLAYBOOK_FILE env var. -# This sets a default of openshift_facts.yml as it's an informative playbook -# that can help test that everything is set properly (inventory, sshkeys) -ENV PLAYBOOK_FILE=playbooks/byo/openshift_facts.yml \ - OPTS="-v" \ - INSTALL_OC=true - -# playbook2image's assemble script expects the source to be available in -# /tmp/src (as per the source-to-image specs) so we import it there -ADD . /tmp/src - -# Running the 'assemble' script provided by playbook2image will install -# dependencies specified in requirements.txt and install the 'oc' client -# as per the INSTALL_OC environment setting above -RUN /usr/libexec/s2i/assemble - -# Add files for running as a system container -COPY images/installer/system-container/root / - -CMD [ "/usr/libexec/s2i/run" ] +WORKDIR ${WORK_DIR} +ENTRYPOINT [ "/usr/local/bin/entrypoint" ] +CMD [ "/usr/local/bin/run" ] diff --git a/images/installer/Dockerfile.rhel7 b/images/installer/Dockerfile.rhel7 index f861d4bcf6b..3110f409c4e 100644 --- a/images/installer/Dockerfile.rhel7 +++ b/images/installer/Dockerfile.rhel7 @@ -1,55 +1,46 @@ -FROM openshift3/playbook2image +FROM rhel7.3:7.3-released MAINTAINER OpenShift Team -# override env vars from base image -ENV SUMMARY="OpenShift's installation and configuration tool" \ - DESCRIPTION="A containerized openshift-ansible image to let you run playbooks to install, upgrade, maintain and check an OpenShift cluster" +USER root + +# Playbooks, roles, and their dependencies are installed from packages. +RUN INSTALL_PKGS="atomic-openshift-utils atomic-openshift-clients python-boto openssl java-1.8.0-openjdk-headless httpd-tools" \ + && yum repolist > /dev/null \ + && yum-config-manager --enable rhel-7-server-ose-3.6-rpms \ + && yum-config-manager --enable rhel-7-server-rh-common-rpms \ + && yum install -y --setopt=tsflags=nodocs $INSTALL_PKGS \ + && rpm -q $INSTALL_PKGS \ + && yum clean all LABEL name="openshift3/ose-ansible" \ - summary="$SUMMARY" \ - description="$DESCRIPTION" \ + summary="OpenShift's installation and configuration tool" \ + description="A containerized openshift-ansible image to let you run playbooks to install, upgrade, maintain and check an OpenShift cluster" \ url="https://github.com/openshift/openshift-ansible" \ io.k8s.display-name="openshift-ansible" \ - io.k8s.description="$DESCRIPTION" \ + io.k8s.description="A containerized openshift-ansible image to let you run playbooks to install, upgrade, maintain and check an OpenShift cluster" \ io.openshift.expose-services="" \ io.openshift.tags="openshift,install,upgrade,ansible" \ com.redhat.component="aos3-installation-docker" \ version="v3.6.0" \ release="1" \ - architecture="x86_64" - -# Playbooks, roles and their dependencies are installed from packages. -# Unlike in Dockerfile, we don't invoke the 'assemble' script here -# because all content and dependencies (like 'oc') is already -# installed via yum. -USER root -RUN INSTALL_PKGS="atomic-openshift-utils atomic-openshift-clients python-boto skopeo openssl java-1.8.0-openjdk-headless httpd-tools" && \ - yum repolist > /dev/null && \ - yum-config-manager --enable rhel-7-server-ose-3.6-rpms && \ - yum-config-manager --enable rhel-7-server-rh-common-rpms && \ - yum install -y $INSTALL_PKGS && \ - yum clean all - -# The symlinks below are a (hopefully temporary) hack to work around the fact that this -# image is based on python s2i which uses the python27 SCL instead of system python, -# and so the system python modules we need would otherwise not be in the path. -RUN ln -s /usr/lib/python2.7/site-packages/{boto,passlib} /opt/app-root/lib64/python2.7/ - -USER ${USER_UID} + architecture="x86_64" \ + atomic.run="once" -# The playbook to be run is specified via the PLAYBOOK_FILE env var. -# This sets a default of openshift_facts.yml as it's an informative playbook -# that can help test that everything is set properly (inventory, sshkeys). -# As the playbooks are installed via packages instead of being copied to -# $APP_HOME by the 'assemble' script, we set the WORK_DIR env var to the -# location of openshift-ansible. -ENV PLAYBOOK_FILE=playbooks/byo/openshift_facts.yml \ - ANSIBLE_CONFIG=/usr/share/atomic-openshift-utils/ansible.cfg \ +ENV USER_UID=1001 \ + HOME=/opt/app-root/src \ WORK_DIR=/usr/share/ansible/openshift-ansible \ + ANSIBLE_CONFIG=/usr/share/atomic-openshift-utils/ansible.cfg \ OPTS="-v" -# Add files for running as a system container -COPY system-container/root / +# Add image scripts and files for running as a system container +COPY root / + +RUN /usr/local/bin/user_setup \ + && mv /usr/local/bin/usage{.ocp,} + +USER ${USER_UID} -CMD [ "/usr/libexec/s2i/run" ] +WORKDIR ${WORK_DIR} +ENTRYPOINT [ "/usr/local/bin/entrypoint" ] +CMD [ "/usr/local/bin/run" ] diff --git a/images/installer/system-container/README.md b/images/installer/README_CONTAINER_IMAGE.md similarity index 63% rename from images/installer/system-container/README.md rename to images/installer/README_CONTAINER_IMAGE.md index fbcd47c4af5..bc1ebb4a81a 100644 --- a/images/installer/system-container/README.md +++ b/images/installer/README_CONTAINER_IMAGE.md @@ -1,17 +1,34 @@ -# System container installer +ORIGIN-ANSIBLE IMAGE INSTALLER +=============================== + +Contains Dockerfile information for building an openshift/origin-ansible image +based on `centos:7` or `rhel7.3:7.3-released`. + +Read additional setup information for this image at: https://hub.docker.com/r/openshift/origin-ansible/ + +Read additional information about the `openshift/origin-ansible` at: https://github.com/openshift/openshift-ansible/blob/master/README_CONTAINER_IMAGE.md + +Also contains necessary components for running the installer using an Atomic System Container. + + +System container installer +========================== These files are needed to run the installer using an [Atomic System container](http://www.projectatomic.io/blog/2016/09/intro-to-system-containers/). +These files can be found under `root/exports`: * config.json.template - Template of the configuration file used for running containers. -* manifest.json - Used to define various settings for the system container, such as the default values to use for the installation. - -* run-system-container.sh - Entrypoint to the container. +* manifest.json - Used to define various settings for the system container, such as the default values to use for the installation. * service.template - Template file for the systemd service. * tmpfiles.template - Template file for systemd-tmpfiles. +These files can be found under `root/usr/local/bin`: + +* run-system-container.sh - Entrypoint to the container. + ## Options These options may be set via the ``atomic`` ``--set`` flag. For defaults see ``root/exports/manifest.json`` @@ -28,4 +45,4 @@ These options may be set via the ``atomic`` ``--set`` flag. For defaults see ``r * ANSIBLE_CONFIG - Full path for the ansible configuration file to use inside the container -* INVENTORY_FILE - Full path for the inventory to use from the host +* INVENTORY_FILE - Full path for the inventory to use from the host \ No newline at end of file diff --git a/images/installer/origin-extra-root/etc/pki/rpm-gpg/RPM-GPG-KEY-CentOS-SIG-PaaS b/images/installer/origin-extra-root/etc/pki/rpm-gpg/RPM-GPG-KEY-CentOS-SIG-PaaS new file mode 100644 index 00000000000..fcbaaca0e22 --- /dev/null +++ b/images/installer/origin-extra-root/etc/pki/rpm-gpg/RPM-GPG-KEY-CentOS-SIG-PaaS @@ -0,0 +1,20 @@ +-----BEGIN PGP PUBLIC KEY BLOCK----- +Version: GnuPG v2.0.22 (GNU/Linux) + +mQENBFc8iwUBCADadBGYmA2nFvq79/5uxUQOiPqC/QflWcPX1B6SQKniUhyqaSes +gNMJsPppKRV4NZKITcL8lZ90+Gds0fmL3b5xz1r5Rfm3ilSItEqeGlLIJZBvANyx +rAT3q8EgkkVRyhZPseUMZj04O8OKnt1jrHakVkOp0lJClqhZ+bs/7yLRmaLXTcum ++ouqUKzQoAEDnqe9nJmmJhC6n2vg7o0PCo/9qOf/scQbv4FNoJfmkcVLRmwmqzgh +bGj6QaOgij3sl94pZ3HFop4f+eU0kNbyt9J18fKI8X0DdHkDW8kO1UwwHT2ibJ1t +mBaUsE1zZ0DvfyFad1xXAgm+SIlJgdpPvPNLABEBAAG0WUNlbnRPUyBQYWFTIFNJ +RyAoaHR0cHM6Ly93aWtpLmNlbnRvcy5vcmcvU3BlY2lhbEludGVyZXN0R3JvdXAv +UGFhUykgPHNlY3VyaXR5QGNlbnRvcy5vcmc+iQE5BBMBAgAjBQJXPIsFAhsDBwsJ +CAcDAgEGFQgCCQoLBBYCAwECHgECF4AACgkQw0xb1C8pfsyT2gf9FqJoc8oZ+T5A +8cZslMyCWziPi0o7kd/Rw91T7dkV+VIC+sFlVga7fkPEAiD8U7JFE+a1IlcjfGuY +my4S6UH8K5zL36CRg2MF112HE5TWoBxF3KZg9nOJQ2NLapJowaP8uITYG4vlgV3g +GJD2OC191tjcqmelFnhAN0EBdxrRrBJ7tr3OCtL6bJ6NPQ0bXPI2Fjbm7SbxTfpE +ggEU8R7WZQApYgl8zRfyS12SfpFV8ZU+lIBmJaU1qaY4/BmNgG6e7clmq8xVZQLg +ZH9qi9+HPh+80+8/WhJUddlVXc2g6c4VjnnFpZfsrMdTAFuEsrjkyaxqeBjXCgbb +pzGjTg0LXg== +=CVSF +-----END PGP PUBLIC KEY BLOCK----- diff --git a/images/installer/origin-extra-root/etc/yum.repos.d/centos-openshift-origin.repo b/images/installer/origin-extra-root/etc/yum.repos.d/centos-openshift-origin.repo new file mode 100644 index 00000000000..d18994fdd90 --- /dev/null +++ b/images/installer/origin-extra-root/etc/yum.repos.d/centos-openshift-origin.repo @@ -0,0 +1,7 @@ + +[centos-openshift-origin] +name=CentOS OpenShift Origin +baseurl=http://mirror.centos.org/centos/7/paas/x86_64/openshift-origin/ +enabled=1 +gpgcheck=1 +gpgkey=file:///etc/pki/rpm-gpg/RPM-GPG-KEY-CentOS-SIG-PaaS diff --git a/images/installer/system-container/root/exports/config.json.template b/images/installer/root/exports/config.json.template similarity index 100% rename from images/installer/system-container/root/exports/config.json.template rename to images/installer/root/exports/config.json.template diff --git a/images/installer/system-container/root/exports/manifest.json b/images/installer/root/exports/manifest.json similarity index 100% rename from images/installer/system-container/root/exports/manifest.json rename to images/installer/root/exports/manifest.json diff --git a/images/installer/system-container/root/exports/service.template b/images/installer/root/exports/service.template similarity index 100% rename from images/installer/system-container/root/exports/service.template rename to images/installer/root/exports/service.template diff --git a/images/installer/system-container/root/exports/tmpfiles.template b/images/installer/root/exports/tmpfiles.template similarity index 100% rename from images/installer/system-container/root/exports/tmpfiles.template rename to images/installer/root/exports/tmpfiles.template diff --git a/images/installer/root/usr/local/bin/entrypoint b/images/installer/root/usr/local/bin/entrypoint new file mode 100755 index 00000000000..777bf3f11c2 --- /dev/null +++ b/images/installer/root/usr/local/bin/entrypoint @@ -0,0 +1,17 @@ +#!/bin/bash -e +# +# This file serves as the main entrypoint to the openshift-ansible image. +# +# For more information see the documentation: +# https://github.com/openshift/openshift-ansible/blob/master/README_CONTAINER_IMAGE.md + + +# Patch /etc/passwd file with the current user info. +# The current user's entry must be correctly defined in this file in order for +# the `ssh` command to work within the created container. + +if ! whoami &>/dev/null; then + echo "${USER:-default}:x:$(id -u):$(id -g):Default User:$HOME:/sbin/nologin" >> /etc/passwd +fi + +exec "$@" diff --git a/images/installer/root/usr/local/bin/run b/images/installer/root/usr/local/bin/run new file mode 100755 index 00000000000..9401ea118b8 --- /dev/null +++ b/images/installer/root/usr/local/bin/run @@ -0,0 +1,46 @@ +#!/bin/bash -e +# +# This file serves as the default command to the openshift-ansible image. +# Runs a playbook with inventory as specified by environment variables. +# +# For more information see the documentation: +# https://github.com/openshift/openshift-ansible/blob/master/README_CONTAINER_IMAGE.md + +# SOURCE and HOME DIRECTORY: /opt/app-root/src + +if [[ -z "${PLAYBOOK_FILE}" ]]; then + echo + echo "PLAYBOOK_FILE must be provided." + exec /usr/local/bin/usage +fi + +INVENTORY="$(mktemp)" +if [[ -v INVENTORY_FILE ]]; then + # Make a copy so that ALLOW_ANSIBLE_CONNECTION_LOCAL below + # does not attempt to modify the original + cp -a ${INVENTORY_FILE} ${INVENTORY} +elif [[ -v INVENTORY_URL ]]; then + curl -o ${INVENTORY} ${INVENTORY_URL} +elif [[ -v DYNAMIC_SCRIPT_URL ]]; then + curl -o ${INVENTORY} ${DYNAMIC_SCRIPT_URL} + chmod 755 ${INVENTORY} +else + echo + echo "One of INVENTORY_FILE, INVENTORY_URL or DYNAMIC_SCRIPT_URL must be provided." + exec /usr/local/bin/usage +fi +INVENTORY_ARG="-i ${INVENTORY}" + +if [[ "$ALLOW_ANSIBLE_CONNECTION_LOCAL" = false ]]; then + sed -i s/ansible_connection=local// ${INVENTORY} +fi + +if [[ -v VAULT_PASS ]]; then + VAULT_PASS_FILE=.vaultpass + echo ${VAULT_PASS} > ${VAULT_PASS_FILE} + VAULT_PASS_ARG="--vault-password-file ${VAULT_PASS_FILE}" +fi + +cd ${WORK_DIR} + +exec ansible-playbook ${INVENTORY_ARG} ${VAULT_PASS_ARG} ${OPTS} ${PLAYBOOK_FILE} diff --git a/images/installer/system-container/root/usr/local/bin/run-system-container.sh b/images/installer/root/usr/local/bin/run-system-container.sh similarity index 100% rename from images/installer/system-container/root/usr/local/bin/run-system-container.sh rename to images/installer/root/usr/local/bin/run-system-container.sh diff --git a/images/installer/root/usr/local/bin/usage b/images/installer/root/usr/local/bin/usage new file mode 100755 index 00000000000..3518d7f19d4 --- /dev/null +++ b/images/installer/root/usr/local/bin/usage @@ -0,0 +1,33 @@ +#!/bin/bash -e +cat <<"EOF" + +The origin-ansible image provides several options to control the behaviour of the containers. +For more details on these options see the documentation: + + https://github.com/openshift/openshift-ansible/blob/master/README_CONTAINER_IMAGE.md + +At a minimum, when running a container using this image you must provide: + +* ssh keys so that Ansible can reach your hosts. These should be mounted as a volume under + /opt/app-root/src/.ssh +* An inventory file. This can be mounted inside the container as a volume and specified with the + INVENTORY_FILE environment variable. Alternatively you can serve the inventory file from a web + server and use the INVENTORY_URL environment variable to fetch it. +* The playbook to run. This is set using the PLAYBOOK_FILE environment variable. + +Here is an example of how to run a containerized origin-ansible with +the openshift_facts playbook, which collects and displays facts about your +OpenShift environment. The inventory and ssh keys are mounted as volumes +(the latter requires setting the uid in the container and SELinux label +in the key file via :Z so they can be accessed) and the PLAYBOOK_FILE +environment variable is set to point to the playbook within the image: + +docker run -tu `id -u` \ + -v $HOME/.ssh/id_rsa:/opt/app-root/src/.ssh/id_rsa:Z,ro \ + -v /etc/ansible/hosts:/tmp/inventory:Z,ro \ + -e INVENTORY_FILE=/tmp/inventory \ + -e OPTS="-v" \ + -e PLAYBOOK_FILE=playbooks/byo/openshift_facts.yml \ + openshift/origin-ansible + +EOF diff --git a/images/installer/root/usr/local/bin/usage.ocp b/images/installer/root/usr/local/bin/usage.ocp new file mode 100755 index 00000000000..50593af6e95 --- /dev/null +++ b/images/installer/root/usr/local/bin/usage.ocp @@ -0,0 +1,33 @@ +#!/bin/bash -e +cat <<"EOF" + +The ose-ansible image provides several options to control the behaviour of the containers. +For more details on these options see the documentation: + + https://github.com/openshift/openshift-ansible/blob/master/README_CONTAINER_IMAGE.md + +At a minimum, when running a container using this image you must provide: + +* ssh keys so that Ansible can reach your hosts. These should be mounted as a volume under + /opt/app-root/src/.ssh +* An inventory file. This can be mounted inside the container as a volume and specified with the + INVENTORY_FILE environment variable. Alternatively you can serve the inventory file from a web + server and use the INVENTORY_URL environment variable to fetch it. +* The playbook to run. This is set using the PLAYBOOK_FILE environment variable. + +Here is an example of how to run a containerized ose-ansible with +the openshift_facts playbook, which collects and displays facts about your +OpenShift environment. The inventory and ssh keys are mounted as volumes +(the latter requires setting the uid in the container and SELinux label +in the key file via :Z so they can be accessed) and the PLAYBOOK_FILE +environment variable is set to point to the playbook within the image: + +docker run -tu `id -u` \ + -v $HOME/.ssh/id_rsa:/opt/app-root/src/.ssh/id_rsa:Z,ro \ + -v /etc/ansible/hosts:/tmp/inventory:Z,ro \ + -e INVENTORY_FILE=/tmp/inventory \ + -e OPTS="-v" \ + -e PLAYBOOK_FILE=playbooks/byo/openshift_facts.yml \ + openshift3/ose-ansible + +EOF diff --git a/images/installer/root/usr/local/bin/user_setup b/images/installer/root/usr/local/bin/user_setup new file mode 100755 index 00000000000..b76e60a4d97 --- /dev/null +++ b/images/installer/root/usr/local/bin/user_setup @@ -0,0 +1,17 @@ +#!/bin/sh +set -x + +# ensure $HOME exists and is accessible by group 0 (we don't know what the runtime UID will be) +mkdir -p ${HOME} +chown ${USER_UID}:0 ${HOME} +chmod ug+rwx ${HOME} + +# runtime user will need to be able to self-insert in /etc/passwd +chmod g+rw /etc/passwd + +# ensure that the ansible content is accessible +chmod -R g+r ${WORK_DIR} +find ${WORK_DIR} -type d -exec chmod g+x {} + + +# no need for this script to remain in the image after running +rm $0 diff --git a/playbooks/byo/openshift-checks/README.md b/playbooks/byo/openshift-checks/README.md index 4b2ff1f941d..f0f14b26820 100644 --- a/playbooks/byo/openshift-checks/README.md +++ b/playbooks/byo/openshift-checks/README.md @@ -39,7 +39,9 @@ against your inventory file. Here is the step-by-step: $ cd openshift-ansible ``` -2. Run the appropriate playbook: +2. Install the [dependencies](../../../README.md#setup) + +3. Run the appropriate playbook: ```console $ ansible-playbook -i playbooks/byo/openshift-checks/pre-install.yml @@ -57,9 +59,8 @@ against your inventory file. Here is the step-by-step: $ ansible-playbook -i playbooks/byo/openshift-checks/certificate_expiry/default.yaml -v ``` -## Running via Docker image +## Running in a container This repository is built into a Docker image including Ansible so that it can -be run anywhere Docker is available. Instructions for doing so may be found -[in the README](../../README_CONTAINER_IMAGE.md). - +be run anywhere Docker is available, without the need to manually install dependencies. +Instructions for doing so may be found [in the README](../../../README_CONTAINER_IMAGE.md). diff --git a/playbooks/common/openshift-checks/health.yml b/playbooks/common/openshift-checks/health.yml index c7766ff0462..7e83b4aa6d7 100644 --- a/playbooks/common/openshift-checks/health.yml +++ b/playbooks/common/openshift-checks/health.yml @@ -1,16 +1,13 @@ --- -# openshift_health_checker depends on openshift_version which now requires group eval. - include: ../openshift-cluster/evaluate_groups.yml - tags: - - always - name: Run OpenShift health checks hosts: OSEv3 roles: - openshift_health_checker vars: - - r_openshift_health_checker_playbook_context: "health" + - r_openshift_health_checker_playbook_context: health post_tasks: - - action: openshift_health_check # https://github.com/ansible/ansible/issues/20513 + - action: openshift_health_check args: checks: ['@health'] diff --git a/playbooks/common/openshift-checks/pre-install.yml b/playbooks/common/openshift-checks/pre-install.yml index 7ca9f7e8b40..afd4f95e008 100644 --- a/playbooks/common/openshift-checks/pre-install.yml +++ b/playbooks/common/openshift-checks/pre-install.yml @@ -1,16 +1,13 @@ --- -# openshift_health_checker depends on openshift_version which now requires group eval. - include: ../openshift-cluster/evaluate_groups.yml - tags: - - always - hosts: OSEv3 name: run OpenShift pre-install checks roles: - openshift_health_checker vars: - - r_openshift_health_checker_playbook_context: "pre-install" + - r_openshift_health_checker_playbook_context: pre-install post_tasks: - - action: openshift_health_check # https://github.com/ansible/ansible/issues/20513 + - action: openshift_health_check args: checks: ['@preflight'] diff --git a/playbooks/common/openshift-cluster/config.yml b/playbooks/common/openshift-cluster/config.yml index 7224ae712df..31c4b04afa5 100644 --- a/playbooks/common/openshift-cluster/config.yml +++ b/playbooks/common/openshift-cluster/config.yml @@ -6,7 +6,7 @@ roles: - openshift_health_checker vars: - - r_openshift_health_checker_playbook_context: "install" + - r_openshift_health_checker_playbook_context: install post_tasks: - action: openshift_health_check args: diff --git a/roles/openshift_health_checker/action_plugins/openshift_health_check.py b/roles/openshift_health_checker/action_plugins/openshift_health_check.py index 0390dc82e95..23da5394076 100644 --- a/roles/openshift_health_checker/action_plugins/openshift_health_check.py +++ b/roles/openshift_health_checker/action_plugins/openshift_health_check.py @@ -13,6 +13,7 @@ display = Display() from ansible.plugins.action import ActionBase +from ansible.module_utils.six import string_types # Augment sys.path so that we can import checks from a directory relative to # this callback plugin. @@ -37,9 +38,10 @@ def run(self, tmp=None, task_vars=None): return result try: - known_checks = self.load_known_checks() + known_checks = self.load_known_checks(tmp, task_vars) args = self._task.args - resolved_checks = resolve_checks(args.get("checks", []), known_checks.values()) + requested_checks = normalize(args.get('checks', [])) + resolved_checks = resolve_checks(requested_checks, known_checks.values()) except OpenShiftCheckException as e: result["failed"] = True result["msg"] = str(e) @@ -47,22 +49,19 @@ def run(self, tmp=None, task_vars=None): result["checks"] = check_results = {} - user_disabled_checks = [ - check.strip() - for check in task_vars.get("openshift_disable_check", "").split(",") - ] + user_disabled_checks = normalize(task_vars.get('openshift_disable_check', [])) for check_name in resolved_checks: display.banner("CHECK [{} : {}]".format(check_name, task_vars["ansible_host"])) check = known_checks[check_name] - if not check.is_active(task_vars): + if not check.is_active(): r = dict(skipped=True, skipped_reason="Not active for this host") elif check_name in user_disabled_checks: r = dict(skipped=True, skipped_reason="Disabled by user request") else: try: - r = check.run(tmp, task_vars) + r = check.run() except OpenShiftCheckException as e: r = dict( failed=True, @@ -78,7 +77,7 @@ def run(self, tmp=None, task_vars=None): result["changed"] = any(r.get("changed", False) for r in check_results.values()) return result - def load_known_checks(self): + def load_known_checks(self, tmp, task_vars): load_checks() known_checks = {} @@ -91,7 +90,7 @@ def load_known_checks(self): check_name, cls.__module__, cls.__name__, other_cls.__module__, other_cls.__name__)) - known_checks[check_name] = cls(execute_module=self._execute_module) + known_checks[check_name] = cls(execute_module=self._execute_module, tmp=tmp, task_vars=task_vars) return known_checks @@ -134,3 +133,14 @@ def resolve_checks(names, all_checks): resolved.update(tag_to_checks[tag]) return resolved + + +def normalize(checks): + """Return a clean list of check names. + + The input may be a comma-separated string or a sequence. Leading and + trailing whitespace characters are removed. Empty items are discarded. + """ + if isinstance(checks, string_types): + checks = checks.split(',') + return [name.strip() for name in checks if name.strip()] diff --git a/roles/openshift_health_checker/callback_plugins/zz_failure_summary.py b/roles/openshift_health_checker/callback_plugins/zz_failure_summary.py index 443b76ea1b1..d10200719fa 100644 --- a/roles/openshift_health_checker/callback_plugins/zz_failure_summary.py +++ b/roles/openshift_health_checker/callback_plugins/zz_failure_summary.py @@ -1,6 +1,6 @@ -''' -Ansible callback plugin. -''' +""" +Ansible callback plugin to give a nicely formatted summary of failures. +""" # Reason: In several locations below we disable pylint protected-access # for Ansible objects that do not give us any public way @@ -16,11 +16,11 @@ class CallbackModule(CallbackBase): - ''' + """ This callback plugin stores task results and summarizes failures. The file name is prefixed with `zz_` to make this plugin be loaded last by Ansible, thus making its output the last thing that users see. - ''' + """ CALLBACK_VERSION = 2.0 CALLBACK_TYPE = 'aggregate' @@ -48,7 +48,7 @@ def v2_playbook_on_stats(self, stats): self._print_failure_details(self.__failures) def _print_failure_details(self, failures): - '''Print a summary of failed tasks or checks.''' + """Print a summary of failed tasks or checks.""" self._display.display(u'\nFailure summary:\n') width = len(str(len(failures))) @@ -69,7 +69,9 @@ def _print_failure_details(self, failures): playbook_context = None # re: result attrs see top comment # pylint: disable=protected-access for failure in failures: - # get context from check task result since callback plugins cannot access task vars + # Get context from check task result since callback plugins cannot access task vars. + # NOTE: thus context is not known unless checks run. Failures prior to checks running + # don't have playbook_context in the results. But we only use it now when checks fail. playbook_context = playbook_context or failure['result']._result.get('playbook_context') failed_checks.update( name @@ -81,8 +83,11 @@ def _print_failure_details(self, failures): def _print_check_failure_summary(self, failed_checks, context): checks = ','.join(sorted(failed_checks)) - # NOTE: context is not set if all failures occurred prior to checks task - summary = ( + # The purpose of specifying context is to vary the output depending on what the user was + # expecting to happen (based on which playbook they ran). The only use currently is to + # vary the message depending on whether the user was deliberately running checks or was + # trying to install/upgrade and checks are just included. Other use cases may arise. + summary = ( # default to explaining what checks are in the first place '\n' 'The execution of "{playbook}"\n' 'includes checks designed to fail early if the requirements\n' @@ -94,27 +99,26 @@ def _print_check_failure_summary(self, failed_checks, context): 'Some checks may be configurable by variables if your requirements\n' 'are different from the defaults; consult check documentation.\n' 'Variables can be set in the inventory or passed on the\n' - 'command line using the -e flag to ansible-playbook.\n' + 'command line using the -e flag to ansible-playbook.\n\n' ).format(playbook=self._playbook_file, checks=checks) if context in ['pre-install', 'health']: - summary = ( + summary = ( # user was expecting to run checks, less explanation needed '\n' 'You may choose to configure or disable failing checks by\n' 'setting Ansible variables. To disable those above:\n\n' ' openshift_disable_check={checks}\n\n' 'Consult check documentation for configurable variables.\n' 'Variables can be set in the inventory or passed on the\n' - 'command line using the -e flag to ansible-playbook.\n' + 'command line using the -e flag to ansible-playbook.\n\n' ).format(checks=checks) - # other expected contexts: install, upgrade self._display.display(summary) # re: result attrs see top comment # pylint: disable=protected-access def _format_failure(failure): - '''Return a list of pretty-formatted text entries describing a failure, including + """Return a list of pretty-formatted text entries describing a failure, including relevant information about it. Expect that the list of text entries will be joined - by a newline separator when output to the user.''' + by a newline separator when output to the user.""" result = failure['result'] host = result._host.get_name() play = _get_play(result._task) @@ -135,7 +139,7 @@ def _format_failure(failure): def _format_failed_checks(checks): - '''Return pretty-formatted text describing checks that failed.''' + """Return pretty-formatted text describing checks that failed.""" failed_check_msgs = [] for check, body in checks.items(): if body.get('failed', False): # only show the failed checks @@ -150,7 +154,7 @@ def _format_failed_checks(checks): # This is inspired by ansible.playbook.base.Base.dump_me. # re: play/task/block attrs see top comment # pylint: disable=protected-access def _get_play(obj): - '''Given a task or block, recursively tries to find its parent play.''' + """Given a task or block, recursively try to find its parent play.""" if hasattr(obj, '_play'): return obj._play if getattr(obj, '_parent'): diff --git a/roles/openshift_health_checker/library/aos_version.py b/roles/openshift_health_checker/library/aos_version.py old mode 100755 new mode 100644 index 4f43ee751bc..f9babebb996 --- a/roles/openshift_health_checker/library/aos_version.py +++ b/roles/openshift_health_checker/library/aos_version.py @@ -1,5 +1,5 @@ #!/usr/bin/python -''' +""" Ansible module for yum-based systems determining if multiple releases of an OpenShift package are available, and if the release requested (if any) is available down to the given precision. @@ -16,7 +16,7 @@ like the user to have a helpful error message if we detect things will not work out right. Note that if openshift_release is not specified in the inventory, the version comparison checks just pass. -''' +""" from ansible.module_utils.basic import AnsibleModule # NOTE: because of the dependency on yum (Python 2-only), this module does not @@ -32,7 +32,7 @@ class AosVersionException(Exception): - '''Base exception class for package version problems''' + """Base exception class for package version problems""" def __init__(self, message, problem_pkgs=None): Exception.__init__(self, message) self.problem_pkgs = problem_pkgs diff --git a/roles/openshift_health_checker/library/check_yum_update.py b/roles/openshift_health_checker/library/check_yum_update.py old mode 100755 new mode 100644 diff --git a/roles/openshift_health_checker/library/docker_info.py b/roles/openshift_health_checker/library/docker_info.py index 7f712bcff1d..0d0ddae8b4b 100644 --- a/roles/openshift_health_checker/library/docker_info.py +++ b/roles/openshift_health_checker/library/docker_info.py @@ -1,4 +1,3 @@ -# pylint: disable=missing-docstring """ Ansible module for determining information about the docker host. @@ -13,6 +12,7 @@ def main(): + """Entrypoint for running an Ansible module.""" client = AnsibleDockerClient() client.module.exit_json( diff --git a/roles/openshift_health_checker/library/search_journalctl.py b/roles/openshift_health_checker/library/search_journalctl.py new file mode 100644 index 00000000000..3631f71c8fc --- /dev/null +++ b/roles/openshift_health_checker/library/search_journalctl.py @@ -0,0 +1,150 @@ +#!/usr/bin/python +"""Interface to journalctl.""" + +from time import time +import json +import re +import subprocess + +from ansible.module_utils.basic import AnsibleModule + + +class InvalidMatcherRegexp(Exception): + """Exception class for invalid matcher regexp.""" + pass + + +class InvalidLogEntry(Exception): + """Exception class for invalid / non-json log entries.""" + pass + + +class LogInputSubprocessError(Exception): + """Exception class for errors that occur while executing a subprocess.""" + pass + + +def main(): + """Scan a given list of "log_matchers" for journalctl messages containing given patterns. + "log_matchers" is a list of dicts consisting of three keys that help fine-tune log searching: + 'start_regexp', 'regexp', and 'unit'. + + Sample "log_matchers" list: + + [ + { + 'start_regexp': r'Beginning of systemd unit', + 'regexp': r'the specific log message to find', + 'unit': 'etcd', + } + ] + """ + module = AnsibleModule( + argument_spec=dict( + log_count_limit=dict(type="int", default=500), + log_matchers=dict(type="list", required=True), + ), + ) + + timestamp_limit_seconds = time() - 60 * 60 # 1 hour + + log_count_limit = module.params["log_count_limit"] + log_matchers = module.params["log_matchers"] + + matched_regexp, errors = get_log_matches(log_matchers, log_count_limit, timestamp_limit_seconds) + + module.exit_json( + changed=False, + failed=bool(errors), + errors=errors, + matched=matched_regexp, + ) + + +def get_log_matches(matchers, log_count_limit, timestamp_limit_seconds): + """Return a list of up to log_count_limit matches for each matcher. + + Log entries are only considered if newer than timestamp_limit_seconds. + """ + matched_regexp = [] + errors = [] + + for matcher in matchers: + try: + log_output = get_log_output(matcher) + except LogInputSubprocessError as err: + errors.append(str(err)) + continue + + try: + matched = find_matches(log_output, matcher, log_count_limit, timestamp_limit_seconds) + if matched: + matched_regexp.append(matcher.get("regexp", "")) + except InvalidMatcherRegexp as err: + errors.append(str(err)) + except InvalidLogEntry as err: + errors.append(str(err)) + + return matched_regexp, errors + + +def get_log_output(matcher): + """Return an iterator on the logs of a given matcher.""" + try: + cmd_output = subprocess.Popen(list([ + '/bin/journalctl', + '-ru', matcher.get("unit", ""), + '--output', 'json', + ]), stdout=subprocess.PIPE) + + return iter(cmd_output.stdout.readline, '') + + except subprocess.CalledProcessError as exc: + msg = "Could not obtain journalctl logs for the specified systemd unit: {}: {}" + raise LogInputSubprocessError(msg.format(matcher.get("unit", ""), str(exc))) + except OSError as exc: + raise LogInputSubprocessError(str(exc)) + + +def find_matches(log_output, matcher, log_count_limit, timestamp_limit_seconds): + """Return log messages matched in iterable log_output by a given matcher. + + Ignore any log_output items older than timestamp_limit_seconds. + """ + try: + regexp = re.compile(matcher.get("regexp", "")) + start_regexp = re.compile(matcher.get("start_regexp", "")) + except re.error as err: + msg = "A log matcher object was provided with an invalid regular expression: {}" + raise InvalidMatcherRegexp(msg.format(str(err))) + + matched = None + + for log_count, line in enumerate(log_output): + if log_count >= log_count_limit: + break + + try: + obj = json.loads(line) + + # don't need to look past the most recent service restart + if start_regexp.match(obj["MESSAGE"]): + break + + log_timestamp_seconds = float(obj["__REALTIME_TIMESTAMP"]) / 1000000 + if log_timestamp_seconds < timestamp_limit_seconds: + break + + if regexp.match(obj["MESSAGE"]): + matched = line + break + + except ValueError: + msg = "Log entry for systemd unit {} contained invalid json syntax: {}" + raise InvalidLogEntry(msg.format(matcher.get("unit"), line)) + + return matched + + +if __name__ == '__main__': + main() diff --git a/roles/openshift_health_checker/openshift_checks/__init__.py b/roles/openshift_health_checker/openshift_checks/__init__.py index 5c9949ced05..85cbc62245d 100644 --- a/roles/openshift_health_checker/openshift_checks/__init__.py +++ b/roles/openshift_health_checker/openshift_checks/__init__.py @@ -19,15 +19,21 @@ class OpenShiftCheckException(Exception): @six.add_metaclass(ABCMeta) class OpenShiftCheck(object): - """A base class for defining checks for an OpenShift cluster environment.""" + """ + A base class for defining checks for an OpenShift cluster environment. + + Expect optional params: method execute_module, dict task_vars, and string tmp. + execute_module is expected to have a signature compatible with _execute_module + from ansible plugins/action/__init__.py, e.g.: + def execute_module(module_name=None, module_args=None, tmp=None, task_vars=None, *args): + This is stored so that it can be invoked in subclasses via check.execute_module("name", args) + which provides the check's stored task_vars and tmp. + """ - def __init__(self, execute_module=None, module_executor=None): - if execute_module is module_executor is None: - raise TypeError( - "__init__() takes either execute_module (recommended) " - "or module_executor (deprecated), none given") - self.execute_module = execute_module or module_executor - self.module_executor = self.execute_module + def __init__(self, execute_module=None, task_vars=None, tmp=None): + self._execute_module = execute_module + self.task_vars = task_vars or {} + self.tmp = tmp @abstractproperty def name(self): @@ -43,13 +49,13 @@ def tags(self): """ return [] - @classmethod - def is_active(cls, task_vars): # pylint: disable=unused-argument + @staticmethod + def is_active(): """Returns true if this check applies to the ansible-playbook run.""" return True @abstractmethod - def run(self, tmp, task_vars): + def run(self): """Executes a check, normally implemented as a module.""" return {} @@ -62,6 +68,66 @@ def subclasses(cls): for subclass in subclass.subclasses(): yield subclass + def execute_module(self, module_name=None, module_args=None): + """Invoke an Ansible module from a check. + + Invoke stored _execute_module, normally copied from the action + plugin, with its params and the task_vars and tmp given at + check initialization. No positional parameters beyond these + are specified. If it's necessary to specify any of the other + parameters to _execute_module then that should just be invoked + directly (with awareness of changes in method signature per + Ansible version). + + So e.g. check.execute_module("foo", dict(arg1=...)) + Return: result hash from module execution. + """ + if self._execute_module is None: + raise NotImplementedError( + self.__class__.__name__ + + " invoked execute_module without providing the method at initialization." + ) + return self._execute_module(module_name, module_args, self.tmp, self.task_vars) + + def get_var(self, *keys, **kwargs): + """Get deeply nested values from task_vars. + + Ansible task_vars structures are Python dicts, often mapping strings to + other dicts. This helper makes it easier to get a nested value, raising + OpenShiftCheckException when a key is not found or returning a default value + provided as a keyword argument. + """ + try: + value = reduce(operator.getitem, keys, self.task_vars) + except (KeyError, TypeError): + if "default" in kwargs: + return kwargs["default"] + raise OpenShiftCheckException("'{}' is undefined".format(".".join(map(str, keys)))) + return value + + @staticmethod + def get_major_minor_version(openshift_image_tag): + """Parse and return the deployed version of OpenShift as a tuple.""" + if openshift_image_tag and openshift_image_tag[0] == 'v': + openshift_image_tag = openshift_image_tag[1:] + + # map major release versions across releases + # to a common major version + openshift_major_release_version = { + "1": "3", + } + + components = openshift_image_tag.split(".") + if not components or len(components) < 2: + msg = "An invalid version of OpenShift was found for this host: {}" + raise OpenShiftCheckException(msg.format(openshift_image_tag)) + + if components[0] in openshift_major_release_version: + components[0] = openshift_major_release_version[components[0]] + + components = tuple(int(x) for x in components[:2]) + return components + LOADER_EXCLUDES = ( "__init__.py", @@ -86,20 +152,3 @@ def load_checks(path=None, subpkg=""): modules.append(import_module(__package__ + subpkg + "." + name[:-3])) return modules - - -def get_var(task_vars, *keys, **kwargs): - """Helper function to get deeply nested values from task_vars. - - Ansible task_vars structures are Python dicts, often mapping strings to - other dicts. This helper makes it easier to get a nested value, raising - OpenShiftCheckException when a key is not found or returning a default value - provided as a keyword argument. - """ - try: - value = reduce(operator.getitem, keys, task_vars) - except (KeyError, TypeError): - if "default" in kwargs: - return kwargs["default"] - raise OpenShiftCheckException("'{}' is undefined".format(".".join(map(str, keys)))) - return value diff --git a/roles/openshift_health_checker/openshift_checks/disk_availability.py b/roles/openshift_health_checker/openshift_checks/disk_availability.py index e93e81efab8..2834612948f 100644 --- a/roles/openshift_health_checker/openshift_checks/disk_availability.py +++ b/roles/openshift_health_checker/openshift_checks/disk_availability.py @@ -3,7 +3,7 @@ import os.path import tempfile -from openshift_checks import OpenShiftCheck, OpenShiftCheckException, get_var +from openshift_checks import OpenShiftCheck, OpenShiftCheckException class DiskAvailability(OpenShiftCheck): @@ -35,22 +35,21 @@ class DiskAvailability(OpenShiftCheck): }, } - @classmethod - def is_active(cls, task_vars): + def is_active(self): """Skip hosts that do not have recommended disk space requirements.""" - group_names = get_var(task_vars, "group_names", default=[]) + group_names = self.get_var("group_names", default=[]) active_groups = set() - for recommendation in cls.recommended_disk_space_bytes.values(): + for recommendation in self.recommended_disk_space_bytes.values(): active_groups.update(recommendation.keys()) has_disk_space_recommendation = bool(active_groups.intersection(group_names)) - return super(DiskAvailability, cls).is_active(task_vars) and has_disk_space_recommendation + return super(DiskAvailability, self).is_active() and has_disk_space_recommendation - def run(self, tmp, task_vars): - group_names = get_var(task_vars, "group_names") - ansible_mounts = get_var(task_vars, "ansible_mounts") + def run(self): + group_names = self.get_var("group_names") + ansible_mounts = self.get_var("ansible_mounts") ansible_mounts = {mount['mount']: mount for mount in ansible_mounts} - user_config = get_var(task_vars, "openshift_check_min_host_disk_gb", default={}) + user_config = self.get_var("openshift_check_min_host_disk_gb", default={}) try: # For backwards-compatibility, if openshift_check_min_host_disk_gb # is a number, then it overrides the required config for '/var'. diff --git a/roles/openshift_health_checker/openshift_checks/docker_image_availability.py b/roles/openshift_health_checker/openshift_checks/docker_image_availability.py index bde81ad2cce..77180223e29 100644 --- a/roles/openshift_health_checker/openshift_checks/docker_image_availability.py +++ b/roles/openshift_health_checker/openshift_checks/docker_image_availability.py @@ -1,6 +1,6 @@ """Check that required Docker images are available.""" -from openshift_checks import OpenShiftCheck, get_var +from openshift_checks import OpenShiftCheck from openshift_checks.mixins import DockerHostMixin @@ -22,25 +22,26 @@ class DockerImageAvailability(DockerHostMixin, OpenShiftCheck): """Check that required Docker images are available. - This check attempts to ensure that required docker images are - either present locally, or able to be pulled down from available - registries defined in a host machine. + Determine docker images that an install would require and check that they + are either present in the host's docker index, or available for the host to pull + with known registries as defined in our inventory file (or defaults). """ name = "docker_image_availability" tags = ["preflight"] - dependencies = ["skopeo", "python-docker-py"] + # we use python-docker-py to check local docker for images, and skopeo + # to look for images available remotely without waiting to pull them. + dependencies = ["python-docker-py", "skopeo"] - @classmethod - def is_active(cls, task_vars): + def is_active(self): """Skip hosts with unsupported deployment types.""" - deployment_type = get_var(task_vars, "openshift_deployment_type") + deployment_type = self.get_var("openshift_deployment_type") has_valid_deployment_type = deployment_type in DEPLOYMENT_IMAGE_INFO - return super(DockerImageAvailability, cls).is_active(task_vars) and has_valid_deployment_type + return super(DockerImageAvailability, self).is_active() and has_valid_deployment_type - def run(self, tmp, task_vars): - msg, failed, changed = self.ensure_dependencies(task_vars) + def run(self): + msg, failed, changed = self.ensure_dependencies() if failed: return { "failed": True, @@ -48,18 +49,18 @@ def run(self, tmp, task_vars): "msg": "Some dependencies are required in order to check Docker image availability.\n" + msg } - required_images = self.required_images(task_vars) - missing_images = set(required_images) - set(self.local_images(required_images, task_vars)) + required_images = self.required_images() + missing_images = set(required_images) - set(self.local_images(required_images)) # exit early if all images were found locally if not missing_images: return {"changed": changed} - registries = self.known_docker_registries(task_vars) + registries = self.known_docker_registries() if not registries: return {"failed": True, "msg": "Unable to retrieve any docker registries.", "changed": changed} - available_images = self.available_images(missing_images, registries, task_vars) + available_images = self.available_images(missing_images, registries) unavailable_images = set(missing_images) - set(available_images) if unavailable_images: @@ -74,8 +75,7 @@ def run(self, tmp, task_vars): return {"changed": changed} - @staticmethod - def required_images(task_vars): + def required_images(self): """ Determine which images we expect to need for this host. Returns: a set of required images like 'openshift/origin:v3.6' @@ -92,17 +92,17 @@ def required_images(task_vars): Registry is not included in constructed images. It may be in oreg_url or etcd image. """ required = set() - deployment_type = get_var(task_vars, "openshift_deployment_type") - host_groups = get_var(task_vars, "group_names") + deployment_type = self.get_var("openshift_deployment_type") + host_groups = self.get_var("group_names") # containerized etcd may not have openshift_image_tag, see bz 1466622 - image_tag = get_var(task_vars, "openshift_image_tag", default="latest") + image_tag = self.get_var("openshift_image_tag", default="latest") image_info = DEPLOYMENT_IMAGE_INFO[deployment_type] if not image_info: return required # template for images that run on top of OpenShift image_url = "{}/{}-{}:{}".format(image_info["namespace"], image_info["name"], "${component}", "${version}") - image_url = get_var(task_vars, "oreg_url", default="") or image_url + image_url = self.get_var("oreg_url", default="") or image_url if 'nodes' in host_groups: for suffix in NODE_IMAGE_SUFFIXES: required.add(image_url.replace("${component}", suffix).replace("${version}", image_tag)) @@ -112,7 +112,7 @@ def required_images(task_vars): required.add(image_info["registry_console_image"]) # images for containerized components - if get_var(task_vars, "openshift", "common", "is_containerized"): + if self.get_var("openshift", "common", "is_containerized"): components = set() if 'nodes' in host_groups: components.update(["node", "openvswitch"]) @@ -125,28 +125,27 @@ def required_images(task_vars): return required - def local_images(self, images, task_vars): + def local_images(self, images): """Filter a list of images and return those available locally.""" return [ image for image in images - if self.is_image_local(image, task_vars) + if self.is_image_local(image) ] - def is_image_local(self, image, task_vars): + def is_image_local(self, image): """Check if image is already in local docker index.""" - result = self.execute_module("docker_image_facts", {"name": image}, task_vars=task_vars) + result = self.execute_module("docker_image_facts", {"name": image}) if result.get("failed", False): return False return bool(result.get("images", [])) - @staticmethod - def known_docker_registries(task_vars): + def known_docker_registries(self): """Build a list of docker registries available according to inventory vars.""" - docker_facts = get_var(task_vars, "openshift", "docker") + docker_facts = self.get_var("openshift", "docker") regs = set(docker_facts["additional_registries"]) - deployment_type = get_var(task_vars, "openshift_deployment_type") + deployment_type = self.get_var("openshift_deployment_type") if deployment_type == "origin": regs.update(["docker.io"]) elif "enterprise" in deployment_type: @@ -154,24 +153,25 @@ def known_docker_registries(task_vars): return list(regs) - def available_images(self, images, registries, task_vars): - """Inspect existing images using Skopeo and return all images successfully inspected.""" + def available_images(self, images, default_registries): + """Search remotely for images. Returns: list of images found.""" return [ image for image in images - if self.is_available_skopeo_image(image, registries, task_vars) + if self.is_available_skopeo_image(image, default_registries) ] - def is_available_skopeo_image(self, image, registries, task_vars): + def is_available_skopeo_image(self, image, default_registries): """Use Skopeo to determine if required image exists in known registry(s).""" + registries = default_registries - # if image does already includes a registry, just use that + # if image already includes a registry, only use that if image.count("/") > 1: registry, image = image.split("/", 1) registries = [registry] for registry in registries: args = {"_raw_params": "skopeo inspect --tls-verify=false docker://{}/{}".format(registry, image)} - result = self.execute_module("command", args, task_vars=task_vars) + result = self.execute_module("command", args) if result.get("rc", 0) == 0 and not result.get("failed"): return True diff --git a/roles/openshift_health_checker/openshift_checks/docker_storage.py b/roles/openshift_health_checker/openshift_checks/docker_storage.py index d2227d244d2..dea15a56e0a 100644 --- a/roles/openshift_health_checker/openshift_checks/docker_storage.py +++ b/roles/openshift_health_checker/openshift_checks/docker_storage.py @@ -2,7 +2,7 @@ import json import os.path import re -from openshift_checks import OpenShiftCheck, OpenShiftCheckException, get_var +from openshift_checks import OpenShiftCheck, OpenShiftCheckException from openshift_checks.mixins import DockerHostMixin @@ -42,8 +42,8 @@ class DockerStorage(DockerHostMixin, OpenShiftCheck): ), ] - def run(self, tmp, task_vars): - msg, failed, changed = self.ensure_dependencies(task_vars) + def run(self): + msg, failed, changed = self.ensure_dependencies() if failed: return { "failed": True, @@ -52,7 +52,7 @@ def run(self, tmp, task_vars): } # attempt to get the docker info hash from the API - docker_info = self.execute_module("docker_info", {}, task_vars=task_vars) + docker_info = self.execute_module("docker_info", {}) if docker_info.get("failed"): return {"failed": True, "changed": changed, "msg": "Failed to query Docker API. Is docker running on this host?"} @@ -76,15 +76,15 @@ def run(self, tmp, task_vars): result = {} if driver == "devicemapper": - result = self.check_devicemapper_support(driver_status, task_vars) + result = self.check_devicemapper_support(driver_status) if driver in ['overlay', 'overlay2']: - result = self.check_overlay_support(docker_info, driver_status, task_vars) + result = self.check_overlay_support(docker_info, driver_status) result['changed'] = result.get('changed', False) or changed return result - def check_devicemapper_support(self, driver_status, task_vars): + def check_devicemapper_support(self, driver_status): """Check if dm storage driver is supported as configured. Return: result dict.""" if driver_status.get("Data loop file"): msg = ( @@ -94,10 +94,10 @@ def check_devicemapper_support(self, driver_status, task_vars): "See http://red.ht/2rNperO for further information." ) return {"failed": True, "msg": msg} - result = self.check_dm_usage(driver_status, task_vars) + result = self.check_dm_usage(driver_status) return result - def check_dm_usage(self, driver_status, task_vars): + def check_dm_usage(self, driver_status): """Check usage thresholds for Docker dm storage driver. Return: result dict. Backing assumptions: We expect devicemapper to be backed by an auto-expanding thin pool implemented as an LV in an LVM2 VG. This is how docker-storage-setup currently configures @@ -109,7 +109,7 @@ def check_dm_usage(self, driver_status, task_vars): could run out of space first; so we check both. """ vals = dict( - vg_free=self.get_vg_free(driver_status.get("Pool Name"), task_vars), + vg_free=self.get_vg_free(driver_status.get("Pool Name")), data_used=driver_status.get("Data Space Used"), data_total=driver_status.get("Data Space Total"), metadata_used=driver_status.get("Metadata Space Used"), @@ -130,7 +130,7 @@ def check_dm_usage(self, driver_status, task_vars): # determine the threshold percentages which usage should not exceed for name, default in [("data", self.max_thinpool_data_usage_percent), ("metadata", self.max_thinpool_meta_usage_percent)]: - percent = get_var(task_vars, "max_thinpool_" + name + "_usage_percent", default=default) + percent = self.get_var("max_thinpool_" + name + "_usage_percent", default=default) try: vals[name + "_threshold"] = float(percent) except ValueError: @@ -157,7 +157,7 @@ def check_dm_usage(self, driver_status, task_vars): vals["msg"] = "\n".join(messages or ["Thinpool usage is within thresholds."]) return vals - def get_vg_free(self, pool, task_vars): + def get_vg_free(self, pool): """Determine which VG to examine according to the pool name. Return: size vgs reports. Pool name is the only indicator currently available from the Docker API driver info. We assume a name that looks like "vg--name-docker--pool"; @@ -174,7 +174,7 @@ def get_vg_free(self, pool, task_vars): vgs_cmd = "/sbin/vgs --noheadings -o vg_free --units g --select vg_name=" + vg_name # should return free space like " 12.00g" if the VG exists; empty if it does not - ret = self.execute_module("command", {"_raw_params": vgs_cmd}, task_vars=task_vars) + ret = self.execute_module("command", {"_raw_params": vgs_cmd}) if ret.get("failed") or ret.get("rc", 0) != 0: raise OpenShiftCheckException( "Is LVM installed? Failed to run /sbin/vgs " @@ -213,7 +213,7 @@ def convert_to_bytes(string): return float(number) * multiplier - def check_overlay_support(self, docker_info, driver_status, task_vars): + def check_overlay_support(self, docker_info, driver_status): """Check if overlay storage driver is supported for this host. Return: result dict.""" # check for xfs as backing store backing_fs = driver_status.get("Backing Filesystem", "[NONE]") @@ -239,13 +239,13 @@ def check_overlay_support(self, docker_info, driver_status, task_vars): # NOTE: we could check for --selinux-enabled here but docker won't even start with # that option until it's supported in the kernel so we don't need to. - return self.check_overlay_usage(docker_info, task_vars) + return self.check_overlay_usage(docker_info) - def check_overlay_usage(self, docker_info, task_vars): + def check_overlay_usage(self, docker_info): """Check disk usage on OverlayFS backing store volume. Return: result dict.""" path = docker_info.get("DockerRootDir", "/var/lib/docker") + "/" + docker_info["Driver"] - threshold = get_var(task_vars, "max_overlay_usage_percent", default=self.max_overlay_usage_percent) + threshold = self.get_var("max_overlay_usage_percent", default=self.max_overlay_usage_percent) try: threshold = float(threshold) except ValueError: @@ -254,7 +254,7 @@ def check_overlay_usage(self, docker_info, task_vars): "msg": "Specified 'max_overlay_usage_percent' is not a percentage: {}".format(threshold), } - mount = self.find_ansible_mount(path, get_var(task_vars, "ansible_mounts")) + mount = self.find_ansible_mount(path, self.get_var("ansible_mounts")) try: free_bytes = mount['size_available'] total_bytes = mount['size_total'] diff --git a/roles/openshift_health_checker/openshift_checks/etcd_imagedata_size.py b/roles/openshift_health_checker/openshift_checks/etcd_imagedata_size.py index c04a69765e0..28c38504dfd 100644 --- a/roles/openshift_health_checker/openshift_checks/etcd_imagedata_size.py +++ b/roles/openshift_health_checker/openshift_checks/etcd_imagedata_size.py @@ -2,7 +2,7 @@ Ansible module for determining if the size of OpenShift image data exceeds a specified limit in an etcd cluster. """ -from openshift_checks import OpenShiftCheck, OpenShiftCheckException, get_var +from openshift_checks import OpenShiftCheck, OpenShiftCheckException class EtcdImageDataSize(OpenShiftCheck): @@ -11,24 +11,25 @@ class EtcdImageDataSize(OpenShiftCheck): name = "etcd_imagedata_size" tags = ["etcd"] - def run(self, tmp, task_vars): - etcd_mountpath = self._get_etcd_mountpath(get_var(task_vars, "ansible_mounts")) + def run(self): + etcd_mountpath = self._get_etcd_mountpath(self.get_var("ansible_mounts")) etcd_avail_diskspace = etcd_mountpath["size_available"] etcd_total_diskspace = etcd_mountpath["size_total"] - etcd_imagedata_size_limit = get_var(task_vars, - "etcd_max_image_data_size_bytes", - default=int(0.5 * float(etcd_total_diskspace - etcd_avail_diskspace))) + etcd_imagedata_size_limit = self.get_var( + "etcd_max_image_data_size_bytes", + default=int(0.5 * float(etcd_total_diskspace - etcd_avail_diskspace)) + ) - etcd_is_ssl = get_var(task_vars, "openshift", "master", "etcd_use_ssl", default=False) - etcd_port = get_var(task_vars, "openshift", "master", "etcd_port", default=2379) - etcd_hosts = get_var(task_vars, "openshift", "master", "etcd_hosts") + etcd_is_ssl = self.get_var("openshift", "master", "etcd_use_ssl", default=False) + etcd_port = self.get_var("openshift", "master", "etcd_port", default=2379) + etcd_hosts = self.get_var("openshift", "master", "etcd_hosts") - config_base = get_var(task_vars, "openshift", "common", "config_base") + config_base = self.get_var("openshift", "common", "config_base") - cert = task_vars.get("etcd_client_cert", config_base + "/master/master.etcd-client.crt") - key = task_vars.get("etcd_client_key", config_base + "/master/master.etcd-client.key") - ca_cert = task_vars.get("etcd_client_ca_cert", config_base + "/master/master.etcd-ca.crt") + cert = self.get_var("etcd_client_cert", default=config_base + "/master/master.etcd-client.crt") + key = self.get_var("etcd_client_key", default=config_base + "/master/master.etcd-client.key") + ca_cert = self.get_var("etcd_client_ca_cert", default=config_base + "/master/master.etcd-ca.crt") for etcd_host in list(etcd_hosts): args = { @@ -46,7 +47,7 @@ def run(self, tmp, task_vars): }, } - etcdkeysize = self.module_executor("etcdkeysize", args, task_vars) + etcdkeysize = self.execute_module("etcdkeysize", args) if etcdkeysize.get("rc", 0) != 0 or etcdkeysize.get("failed"): msg = 'Failed to retrieve stats for etcd host "{host}": {reason}' diff --git a/roles/openshift_health_checker/openshift_checks/etcd_traffic.py b/roles/openshift_health_checker/openshift_checks/etcd_traffic.py new file mode 100644 index 00000000000..cc1b14d8ad5 --- /dev/null +++ b/roles/openshift_health_checker/openshift_checks/etcd_traffic.py @@ -0,0 +1,44 @@ +"""Check that scans journalctl for messages caused as a symptom of increased etcd traffic.""" + +from openshift_checks import OpenShiftCheck + + +class EtcdTraffic(OpenShiftCheck): + """Check if host is being affected by an increase in etcd traffic.""" + + name = "etcd_traffic" + tags = ["health", "etcd"] + + def is_active(self): + """Skip hosts that do not have etcd in their group names.""" + group_names = self.get_var("group_names", default=[]) + valid_group_names = "etcd" in group_names + + version = self.get_var("openshift", "common", "short_version") + valid_version = version in ("3.4", "3.5", "1.4", "1.5") + + return super(EtcdTraffic, self).is_active() and valid_group_names and valid_version + + def run(self): + is_containerized = self.get_var("openshift", "common", "is_containerized") + unit = "etcd_container" if is_containerized else "etcd" + + log_matchers = [{ + "start_regexp": r"Starting Etcd Server", + "regexp": r"etcd: sync duration of [^,]+, expected less than 1s", + "unit": unit + }] + + match = self.execute_module("search_journalctl", {"log_matchers": log_matchers}) + + if match.get("matched"): + msg = ("Higher than normal etcd traffic detected.\n" + "OpenShift 3.4 introduced an increase in etcd traffic.\n" + "Upgrading to OpenShift 3.6 is recommended in order to fix this issue.\n" + "Please refer to https://access.redhat.com/solutions/2916381 for more information.") + return {"failed": True, "msg": msg} + + if match.get("failed"): + return {"failed": True, "msg": "\n".join(match.get("errors"))} + + return {} diff --git a/roles/openshift_health_checker/openshift_checks/etcd_volume.py b/roles/openshift_health_checker/openshift_checks/etcd_volume.py index 7452c9cc125..da7d0364aaa 100644 --- a/roles/openshift_health_checker/openshift_checks/etcd_volume.py +++ b/roles/openshift_health_checker/openshift_checks/etcd_volume.py @@ -1,6 +1,6 @@ """A health check for OpenShift clusters.""" -from openshift_checks import OpenShiftCheck, OpenShiftCheckException, get_var +from openshift_checks import OpenShiftCheck, OpenShiftCheckException class EtcdVolume(OpenShiftCheck): @@ -14,21 +14,18 @@ class EtcdVolume(OpenShiftCheck): # Where to find ectd data, higher priority first. supported_mount_paths = ["/var/lib/etcd", "/var/lib", "/var", "/"] - @classmethod - def is_active(cls, task_vars): - etcd_hosts = get_var(task_vars, "groups", "etcd", default=[]) or get_var(task_vars, "groups", "masters", - default=[]) or [] - is_etcd_host = get_var(task_vars, "ansible_ssh_host") in etcd_hosts - return super(EtcdVolume, cls).is_active(task_vars) and is_etcd_host + def is_active(self): + etcd_hosts = self.get_var("groups", "etcd", default=[]) or self.get_var("groups", "masters", default=[]) or [] + is_etcd_host = self.get_var("ansible_ssh_host") in etcd_hosts + return super(EtcdVolume, self).is_active() and is_etcd_host - def run(self, tmp, task_vars): - mount_info = self._etcd_mount_info(task_vars) + def run(self): + mount_info = self._etcd_mount_info() available = mount_info["size_available"] total = mount_info["size_total"] used = total - available - threshold = get_var( - task_vars, + threshold = self.get_var( "etcd_device_usage_threshold_percent", default=self.default_threshold_percent ) @@ -45,8 +42,8 @@ def run(self, tmp, task_vars): return {"changed": False} - def _etcd_mount_info(self, task_vars): - ansible_mounts = get_var(task_vars, "ansible_mounts") + def _etcd_mount_info(self): + ansible_mounts = self.get_var("ansible_mounts") mounts = {mnt.get("mount"): mnt for mnt in ansible_mounts} for path in self.supported_mount_paths: diff --git a/roles/openshift_health_checker/openshift_checks/logging/curator.py b/roles/openshift_health_checker/openshift_checks/logging/curator.py index c9fc5989687..32d853d5798 100644 --- a/roles/openshift_health_checker/openshift_checks/logging/curator.py +++ b/roles/openshift_health_checker/openshift_checks/logging/curator.py @@ -1,28 +1,21 @@ -""" -Module for performing checks on an Curator logging deployment -""" +"""Check for an aggregated logging Curator deployment""" -from openshift_checks import get_var from openshift_checks.logging.logging import LoggingCheck class Curator(LoggingCheck): - """Module that checks an integrated logging Curator deployment""" + """Check for an aggregated logging Curator deployment""" name = "curator" tags = ["health", "logging"] - logging_namespace = None - - def run(self, tmp, task_vars): + def run(self): """Check various things and gather errors. Returns: result as hash""" - self.logging_namespace = get_var(task_vars, "openshift_logging_namespace", default="logging") - curator_pods, error = super(Curator, self).get_pods_for_component( - self.module_executor, + self.logging_namespace = self.get_var("openshift_logging_namespace", default="logging") + curator_pods, error = self.get_pods_for_component( self.logging_namespace, "curator", - task_vars ) if error: return {"failed": True, "changed": False, "msg": error} @@ -30,7 +23,6 @@ def run(self, tmp, task_vars): if check_error: msg = ("The following Curator deployment issue was found:" - "\n-------\n" "{}".format(check_error)) return {"failed": True, "changed": False, "msg": msg} @@ -46,7 +38,7 @@ def check_curator(self, pods): "Is Curator correctly deployed?" ) - not_running = super(Curator, self).not_running_pods(pods) + not_running = self.not_running_pods(pods) if len(not_running) == len(pods): return ( "The Curator pod is not currently in a running state,\n" diff --git a/roles/openshift_health_checker/openshift_checks/logging/elasticsearch.py b/roles/openshift_health_checker/openshift_checks/logging/elasticsearch.py index 01cb35b81ea..8bdda1f3251 100644 --- a/roles/openshift_health_checker/openshift_checks/logging/elasticsearch.py +++ b/roles/openshift_health_checker/openshift_checks/logging/elasticsearch.py @@ -1,39 +1,31 @@ -""" -Module for performing checks on an Elasticsearch logging deployment -""" +"""Check for an aggregated logging Elasticsearch deployment""" import json import re -from openshift_checks import get_var from openshift_checks.logging.logging import LoggingCheck class Elasticsearch(LoggingCheck): - """Module that checks an integrated logging Elasticsearch deployment""" + """Check for an aggregated logging Elasticsearch deployment""" name = "elasticsearch" tags = ["health", "logging"] - logging_namespace = None - - def run(self, tmp, task_vars): + def run(self): """Check various things and gather errors. Returns: result as hash""" - self.logging_namespace = get_var(task_vars, "openshift_logging_namespace", default="logging") - es_pods, error = super(Elasticsearch, self).get_pods_for_component( - self.execute_module, + self.logging_namespace = self.get_var("openshift_logging_namespace", default="logging") + es_pods, error = self.get_pods_for_component( self.logging_namespace, "es", - task_vars, ) if error: return {"failed": True, "changed": False, "msg": error} - check_error = self.check_elasticsearch(es_pods, task_vars) + check_error = self.check_elasticsearch(es_pods) if check_error: msg = ("The following Elasticsearch deployment issue was found:" - "\n-------\n" "{}".format(check_error)) return {"failed": True, "changed": False, "msg": msg} @@ -41,8 +33,8 @@ def run(self, tmp, task_vars): return {"failed": False, "changed": False, "msg": 'No problems found with Elasticsearch deployment.'} def _not_running_elasticsearch_pods(self, es_pods): - """Returns: list of running pods, list of errors about non-running pods""" - not_running = super(Elasticsearch, self).not_running_pods(es_pods) + """Returns: list of pods that are not running, list of errors about non-running pods""" + not_running = self.not_running_pods(es_pods) if not_running: return not_running, [( 'The following Elasticsearch pods are not running:\n' @@ -54,7 +46,7 @@ def _not_running_elasticsearch_pods(self, es_pods): ))] return not_running, [] - def check_elasticsearch(self, es_pods, task_vars): + def check_elasticsearch(self, es_pods): """Various checks for elasticsearch. Returns: error string""" not_running_pods, error_msgs = self._not_running_elasticsearch_pods(es_pods) running_pods = [pod for pod in es_pods if pod not in not_running_pods] @@ -65,10 +57,10 @@ def check_elasticsearch(self, es_pods, task_vars): } if not pods_by_name: return 'No logging Elasticsearch pods were found. Is logging deployed?' - error_msgs += self._check_elasticsearch_masters(pods_by_name, task_vars) - error_msgs += self._check_elasticsearch_node_list(pods_by_name, task_vars) - error_msgs += self._check_es_cluster_health(pods_by_name, task_vars) - error_msgs += self._check_elasticsearch_diskspace(pods_by_name, task_vars) + error_msgs += self._check_elasticsearch_masters(pods_by_name) + error_msgs += self._check_elasticsearch_node_list(pods_by_name) + error_msgs += self._check_es_cluster_health(pods_by_name) + error_msgs += self._check_elasticsearch_diskspace(pods_by_name) return '\n'.join(error_msgs) @staticmethod @@ -76,14 +68,14 @@ def _build_es_curl_cmd(pod_name, url): base = "exec {name} -- curl -s --cert {base}cert --key {base}key --cacert {base}ca -XGET '{url}'" return base.format(base="/etc/elasticsearch/secret/admin-", name=pod_name, url=url) - def _check_elasticsearch_masters(self, pods_by_name, task_vars): + def _check_elasticsearch_masters(self, pods_by_name): """Check that Elasticsearch masters are sane. Returns: list of error strings""" es_master_names = set() error_msgs = [] for pod_name in pods_by_name.keys(): # Compare what each ES node reports as master and compare for split brain get_master_cmd = self._build_es_curl_cmd(pod_name, "https://localhost:9200/_cat/master") - master_name_str = self._exec_oc(get_master_cmd, [], task_vars) + master_name_str = self.exec_oc(self.logging_namespace, get_master_cmd, []) master_names = (master_name_str or '').split(' ') if len(master_names) > 1: es_master_names.add(master_names[1]) @@ -108,7 +100,7 @@ def _check_elasticsearch_masters(self, pods_by_name, task_vars): return error_msgs - def _check_elasticsearch_node_list(self, pods_by_name, task_vars): + def _check_elasticsearch_node_list(self, pods_by_name): """Check that reported ES masters are accounted for by pods. Returns: list of error strings""" if not pods_by_name: @@ -116,7 +108,7 @@ def _check_elasticsearch_node_list(self, pods_by_name, task_vars): # get ES cluster nodes node_cmd = self._build_es_curl_cmd(list(pods_by_name.keys())[0], 'https://localhost:9200/_nodes') - cluster_node_data = self._exec_oc(node_cmd, [], task_vars) + cluster_node_data = self.exec_oc(self.logging_namespace, node_cmd, []) try: cluster_nodes = json.loads(cluster_node_data)['nodes'] except (ValueError, KeyError): @@ -138,12 +130,12 @@ def _check_elasticsearch_node_list(self, pods_by_name, task_vars): return error_msgs - def _check_es_cluster_health(self, pods_by_name, task_vars): + def _check_es_cluster_health(self, pods_by_name): """Exec into the elasticsearch pods and check the cluster health. Returns: list of errors""" error_msgs = [] for pod_name in pods_by_name.keys(): cluster_health_cmd = self._build_es_curl_cmd(pod_name, 'https://localhost:9200/_cluster/health?pretty=true') - cluster_health_data = self._exec_oc(cluster_health_cmd, [], task_vars) + cluster_health_data = self.exec_oc(self.logging_namespace, cluster_health_cmd, []) try: health_res = json.loads(cluster_health_data) if not health_res or not health_res.get('status'): @@ -162,7 +154,7 @@ def _check_es_cluster_health(self, pods_by_name, task_vars): return error_msgs - def _check_elasticsearch_diskspace(self, pods_by_name, task_vars): + def _check_elasticsearch_diskspace(self, pods_by_name): """ Exec into an ES pod and query the diskspace on the persistent volume. Returns: list of errors @@ -170,7 +162,7 @@ def _check_elasticsearch_diskspace(self, pods_by_name, task_vars): error_msgs = [] for pod_name in pods_by_name.keys(): df_cmd = 'exec {} -- df --output=ipcent,pcent /elasticsearch/persistent'.format(pod_name) - disk_output = self._exec_oc(df_cmd, [], task_vars) + disk_output = self.exec_oc(self.logging_namespace, df_cmd, []) lines = disk_output.splitlines() # expecting one header looking like 'IUse% Use%' and one body line body_re = r'\s*(\d+)%?\s+(\d+)%?\s*$' @@ -182,7 +174,7 @@ def _check_elasticsearch_diskspace(self, pods_by_name, task_vars): continue inode_pct, disk_pct = re.match(body_re, lines[1]).groups() - inode_pct_thresh = get_var(task_vars, 'openshift_check_efk_es_inode_pct', default='90') + inode_pct_thresh = self.get_var('openshift_check_efk_es_inode_pct', default='90') if int(inode_pct) >= int(inode_pct_thresh): error_msgs.append( 'Inode percent usage on the storage volume for logging ES pod "{pod}"\n' @@ -193,7 +185,7 @@ def _check_elasticsearch_diskspace(self, pods_by_name, task_vars): limit=str(inode_pct_thresh), param='openshift_check_efk_es_inode_pct', )) - disk_pct_thresh = get_var(task_vars, 'openshift_check_efk_es_storage_pct', default='80') + disk_pct_thresh = self.get_var('openshift_check_efk_es_storage_pct', default='80') if int(disk_pct) >= int(disk_pct_thresh): error_msgs.append( 'Disk percent usage on the storage volume for logging ES pod "{pod}"\n' @@ -206,12 +198,3 @@ def _check_elasticsearch_diskspace(self, pods_by_name, task_vars): )) return error_msgs - - def _exec_oc(self, cmd_str, extra_args, task_vars): - return super(Elasticsearch, self).exec_oc( - self.execute_module, - self.logging_namespace, - cmd_str, - extra_args, - task_vars, - ) diff --git a/roles/openshift_health_checker/openshift_checks/logging/fluentd.py b/roles/openshift_health_checker/openshift_checks/logging/fluentd.py index 6275672938d..b3485bf44d1 100644 --- a/roles/openshift_health_checker/openshift_checks/logging/fluentd.py +++ b/roles/openshift_health_checker/openshift_checks/logging/fluentd.py @@ -1,37 +1,30 @@ -""" -Module for performing checks on an Fluentd logging deployment -""" +"""Check for an aggregated logging Fluentd deployment""" import json -from openshift_checks import get_var from openshift_checks.logging.logging import LoggingCheck class Fluentd(LoggingCheck): - """Module that checks an integrated logging Fluentd deployment""" + """Check for an aggregated logging Fluentd deployment""" + name = "fluentd" tags = ["health", "logging"] - logging_namespace = None - - def run(self, tmp, task_vars): + def run(self): """Check various things and gather errors. Returns: result as hash""" - self.logging_namespace = get_var(task_vars, "openshift_logging_namespace", default="logging") + self.logging_namespace = self.get_var("openshift_logging_namespace", default="logging") fluentd_pods, error = super(Fluentd, self).get_pods_for_component( - self.execute_module, self.logging_namespace, "fluentd", - task_vars, ) if error: return {"failed": True, "changed": False, "msg": error} - check_error = self.check_fluentd(fluentd_pods, task_vars) + check_error = self.check_fluentd(fluentd_pods) if check_error: msg = ("The following Fluentd deployment issue was found:" - "\n-------\n" "{}".format(check_error)) return {"failed": True, "changed": False, "msg": msg} @@ -53,10 +46,9 @@ def _filter_fluentd_labeled_nodes(nodes_by_name, node_selector): ).format(label=node_selector) return fluentd_nodes, None - @staticmethod - def _check_node_labeling(nodes_by_name, fluentd_nodes, node_selector, task_vars): + def _check_node_labeling(self, nodes_by_name, fluentd_nodes, node_selector): """Note if nodes are not labeled as expected. Returns: error string""" - intended_nodes = get_var(task_vars, 'openshift_logging_fluentd_hosts', default=['--all']) + intended_nodes = self.get_var('openshift_logging_fluentd_hosts', default=['--all']) if not intended_nodes or '--all' in intended_nodes: intended_nodes = nodes_by_name.keys() nodes_missing_labels = set(intended_nodes) - set(fluentd_nodes.keys()) @@ -114,13 +106,15 @@ def _check_fluentd_pods_running(self, pods): )) return None - def check_fluentd(self, pods, task_vars): + def check_fluentd(self, pods): """Verify fluentd is running everywhere. Returns: error string""" - node_selector = get_var(task_vars, 'openshift_logging_fluentd_nodeselector', - default='logging-infra-fluentd=true') + node_selector = self.get_var( + 'openshift_logging_fluentd_nodeselector', + default='logging-infra-fluentd=true' + ) - nodes_by_name, error = self.get_nodes_by_name(task_vars) + nodes_by_name, error = self.get_nodes_by_name() if error: return error @@ -129,7 +123,7 @@ def check_fluentd(self, pods, task_vars): return error error_msgs = [] - error = self._check_node_labeling(nodes_by_name, fluentd_nodes, node_selector, task_vars) + error = self._check_node_labeling(nodes_by_name, fluentd_nodes, node_selector) if error: error_msgs.append(error) error = self._check_nodes_have_fluentd(pods, fluentd_nodes) @@ -148,9 +142,13 @@ def check_fluentd(self, pods, task_vars): return '\n'.join(error_msgs) - def get_nodes_by_name(self, task_vars): + def get_nodes_by_name(self): """Retrieve all the node definitions. Returns: dict(name: node), error string""" - nodes_json = self._exec_oc("get nodes -o json", [], task_vars) + nodes_json = self.exec_oc( + self.logging_namespace, + "get nodes -o json", + [] + ) try: nodes = json.loads(nodes_json) except ValueError: # no valid json - should not happen @@ -161,10 +159,3 @@ def get_nodes_by_name(self, task_vars): node['metadata']['name']: node for node in nodes['items'] }, None - - def _exec_oc(self, cmd_str, extra_args, task_vars): - return super(Fluentd, self).exec_oc(self.execute_module, - self.logging_namespace, - cmd_str, - extra_args, - task_vars) diff --git a/roles/openshift_health_checker/openshift_checks/logging/fluentd_config.py b/roles/openshift_health_checker/openshift_checks/logging/fluentd_config.py new file mode 100644 index 00000000000..0970f0a63ba --- /dev/null +++ b/roles/openshift_health_checker/openshift_checks/logging/fluentd_config.py @@ -0,0 +1,138 @@ +""" +Module for performing checks on a Fluentd logging deployment configuration +""" + +from openshift_checks import OpenShiftCheckException +from openshift_checks.logging.logging import LoggingCheck + + +class FluentdConfig(LoggingCheck): + """Module that checks logging configuration of an integrated logging Fluentd deployment""" + name = "fluentd_config" + tags = ["health"] + + def is_active(self): + logging_deployed = self.get_var("openshift_hosted_logging_deploy", default=False) + + try: + version = self.get_major_minor_version(self.get_var("openshift_image_tag")) + except ValueError: + # if failed to parse OpenShift version, perform check anyway (if logging enabled) + return logging_deployed + + return logging_deployed and version < (3, 6) + + def run(self): + """Check that Fluentd has running pods, and that its logging config matches Docker's logging config.""" + self.logging_namespace = self.get_var("openshift_logging_namespace", default=self.logging_namespace) + config_error = self.check_logging_config() + if config_error: + msg = ("The following Fluentd logging configuration problem was found:" + "\n{}".format(config_error)) + return {"failed": True, "msg": msg} + + return {} + + def check_logging_config(self): + """Ensure that the configured Docker logging driver matches fluentd settings. + This means that, at least for now, if the following condition is met: + + openshift_logging_fluentd_use_journal == True + + then the value of the configured Docker logging driver should be "journald". + Otherwise, the value of the Docker logging driver should be "json-file". + Returns an error string if the above condition is not met, or None otherwise.""" + use_journald = self.get_var("openshift_logging_fluentd_use_journal", default=True) + + # if check is running on a master, retrieve all running pods + # and check any pod's container for the env var "USE_JOURNAL" + group_names = self.get_var("group_names") + if "masters" in group_names: + use_journald = self.check_fluentd_env_var() + + docker_info = self.execute_module("docker_info", {}) + try: + logging_driver = docker_info["info"]["LoggingDriver"] + except KeyError: + return "Unable to determine Docker logging driver." + + logging_driver = docker_info["info"]["LoggingDriver"] + recommended_logging_driver = "journald" + error = None + + # If fluentd is set to use journald but Docker is not, recommend setting the `--log-driver` + # option as an inventory file variable, or adding the log driver value as part of the + # Docker configuration in /etc/docker/daemon.json. There is no global --log-driver flag that + # can be passed to the Docker binary; the only other recommendation that can be made, would be + # to pass the `--log-driver` flag to the "run" sub-command of the `docker` binary when running + # individual containers. + if use_journald and logging_driver != "journald": + error = ('Your Fluentd configuration is set to aggregate Docker container logs from "journald".\n' + 'This differs from your Docker configuration, which has been set to use "{driver}" ' + 'as the default method of storing logs.\n' + 'This discrepancy in configuration will prevent Fluentd from receiving any logs' + 'from your Docker containers.').format(driver=logging_driver) + elif not use_journald and logging_driver != "json-file": + recommended_logging_driver = "json-file" + error = ('Your Fluentd configuration is set to aggregate Docker container logs from ' + 'individual json log files per container.\n ' + 'This differs from your Docker configuration, which has been set to use ' + '"{driver}" as the default method of storing logs.\n' + 'This discrepancy in configuration will prevent Fluentd from receiving any logs' + 'from your Docker containers.').format(driver=logging_driver) + + if error: + error += ('\nTo resolve this issue, add the following variable to your Ansible inventory file:\n\n' + ' openshift_docker_options="--log-driver={driver}"\n\n' + 'Alternatively, you can add the following option to your Docker configuration, located in' + '"/etc/docker/daemon.json":\n\n' + '{{ "log-driver": "{driver}" }}\n\n' + 'See https://docs.docker.com/engine/admin/logging/json-file ' + 'for more information.').format(driver=recommended_logging_driver) + + return error + + def check_fluentd_env_var(self): + """Read and return the value of the 'USE_JOURNAL' environment variable on a fluentd pod.""" + running_pods = self.running_fluentd_pods() + + try: + pod_containers = running_pods[0]["spec"]["containers"] + except KeyError: + return "Unable to detect running containers on selected Fluentd pod." + + if not pod_containers: + msg = ('There are no running containers on selected Fluentd pod "{}".\n' + 'Unable to calculate expected logging driver.').format(running_pods[0]["metadata"].get("name", "")) + raise OpenShiftCheckException(msg) + + pod_env = pod_containers[0].get("env") + if not pod_env: + msg = ('There are no environment variables set on the Fluentd container "{}".\n' + 'Unable to calculate expected logging driver.').format(pod_containers[0].get("name")) + raise OpenShiftCheckException(msg) + + for env in pod_env: + if env["name"] == "USE_JOURNAL": + return env.get("value", "false") != "false" + + return False + + def running_fluentd_pods(self): + """Return a list of running fluentd pods.""" + fluentd_pods, error = self.get_pods_for_component( + self.logging_namespace, + "fluentd", + ) + if error: + msg = 'Unable to retrieve any pods for the "fluentd" logging component: {}'.format(error) + raise OpenShiftCheckException(msg) + + running_fluentd_pods = [pod for pod in fluentd_pods if pod['status']['phase'] == 'Running'] + if not running_fluentd_pods: + msg = ('No Fluentd pods were found to be in the "Running" state. ' + 'At least one Fluentd pod is required in order to perform this check.') + + raise OpenShiftCheckException(msg) + + return running_fluentd_pods diff --git a/roles/openshift_health_checker/openshift_checks/logging/kibana.py b/roles/openshift_health_checker/openshift_checks/logging/kibana.py index 551e8dfa051..efb14ab423d 100644 --- a/roles/openshift_health_checker/openshift_checks/logging/kibana.py +++ b/roles/openshift_health_checker/openshift_checks/logging/kibana.py @@ -12,7 +12,6 @@ from urllib.error import HTTPError, URLError import urllib.request as urllib2 -from openshift_checks import get_var from openshift_checks.logging.logging import LoggingCheck @@ -22,35 +21,30 @@ class Kibana(LoggingCheck): name = "kibana" tags = ["health", "logging"] - logging_namespace = None - - def run(self, tmp, task_vars): + def run(self): """Check various things and gather errors. Returns: result as hash""" - self.logging_namespace = get_var(task_vars, "openshift_logging_namespace", default="logging") - kibana_pods, error = super(Kibana, self).get_pods_for_component( - self.execute_module, + self.logging_namespace = self.get_var("openshift_logging_namespace", default="logging") + kibana_pods, error = self.get_pods_for_component( self.logging_namespace, "kibana", - task_vars, ) if error: return {"failed": True, "changed": False, "msg": error} check_error = self.check_kibana(kibana_pods) if not check_error: - check_error = self._check_kibana_route(task_vars) + check_error = self._check_kibana_route() if check_error: msg = ("The following Kibana deployment issue was found:" - "\n-------\n" "{}".format(check_error)) return {"failed": True, "changed": False, "msg": msg} # TODO(lmeyer): run it all again for the ops cluster return {"failed": False, "changed": False, "msg": 'No problems found with Kibana deployment.'} - def _verify_url_internal(self, url, task_vars): + def _verify_url_internal(self, url): """ Try to reach a URL from the host. Returns: success (bool), reason (for failure) @@ -62,7 +56,7 @@ def _verify_url_internal(self, url, task_vars): # TODO(lmeyer): give users option to validate certs status_code=302, ) - result = self.execute_module('uri', args, None, task_vars) + result = self.execute_module('uri', args) if result.get('failed'): return result['msg'] return None @@ -114,14 +108,18 @@ def check_kibana(self, pods): return None - def _get_kibana_url(self, task_vars): + def _get_kibana_url(self): """ Get kibana route or report error. Returns: url (or empty), reason for failure """ # Get logging url - get_route = self._exec_oc("get route logging-kibana -o json", [], task_vars) + get_route = self.exec_oc( + self.logging_namespace, + "get route logging-kibana -o json", + [], + ) if not get_route: return None, 'no_route_exists' @@ -139,7 +137,7 @@ def _get_kibana_url(self, task_vars): return 'https://{}/'.format(host), None - def _check_kibana_route(self, task_vars): + def _check_kibana_route(self): """ Check to see if kibana route is up and working. Returns: error string @@ -160,12 +158,12 @@ def _check_kibana_route(self, task_vars): ), ) - kibana_url, error = self._get_kibana_url(task_vars) + kibana_url, error = self._get_kibana_url() if not kibana_url: return known_errors.get(error, error) # first, check that kibana is reachable from the master. - error = self._verify_url_internal(kibana_url, task_vars) + error = self._verify_url_internal(kibana_url) if error: if 'urlopen error [Errno 111] Connection refused' in error: error = ( @@ -190,7 +188,7 @@ def _check_kibana_route(self, task_vars): # in production we would like the kibana route to work from outside the # cluster too; but that may not be the case, so allow disabling just this part. - if not get_var(task_vars, "openshift_check_efk_kibana_external", default=True): + if not self.get_var("openshift_check_efk_kibana_external", default=True): return None error = self._verify_url_external(kibana_url) if error: @@ -220,10 +218,3 @@ def _check_kibana_route(self, task_vars): ).format(error=error) return error return None - - def _exec_oc(self, cmd_str, extra_args, task_vars): - return super(Kibana, self).exec_oc(self.execute_module, - self.logging_namespace, - cmd_str, - extra_args, - task_vars) diff --git a/roles/openshift_health_checker/openshift_checks/logging/logging.py b/roles/openshift_health_checker/openshift_checks/logging/logging.py index 6e951e82cd3..43ba6c406ff 100644 --- a/roles/openshift_health_checker/openshift_checks/logging/logging.py +++ b/roles/openshift_health_checker/openshift_checks/logging/logging.py @@ -5,39 +5,39 @@ import json import os -from openshift_checks import OpenShiftCheck, OpenShiftCheckException, get_var +from openshift_checks import OpenShiftCheck, OpenShiftCheckException class LoggingCheck(OpenShiftCheck): - """Base class for logging component checks""" + """Base class for OpenShift aggregated logging component checks""" + + # FIXME: this should not be listed as a check, since it is not meant to be + # run by itself. name = "logging" + logging_namespace = "logging" - @classmethod - def is_active(cls, task_vars): - return super(LoggingCheck, cls).is_active(task_vars) and cls.is_first_master(task_vars) + def is_active(self): + logging_deployed = self.get_var("openshift_hosted_logging_deploy", default=False) + return logging_deployed and super(LoggingCheck, self).is_active() and self.is_first_master() - @staticmethod - def is_first_master(task_vars): - """Run only on first master and only when logging is configured. Returns: bool""" - logging_deployed = get_var(task_vars, "openshift_hosted_logging_deploy", default=True) + def is_first_master(self): + """Determine if running on first master. Returns: bool""" # Note: It would be nice to use membership in oo_first_master group, however for now it # seems best to avoid requiring that setup and just check this is the first master. - hostname = get_var(task_vars, "ansible_ssh_host") or [None] - masters = get_var(task_vars, "groups", "masters", default=None) or [None] - return logging_deployed and masters[0] == hostname + hostname = self.get_var("ansible_ssh_host") or [None] + masters = self.get_var("groups", "masters", default=None) or [None] + return masters[0] == hostname - def run(self, tmp, task_vars): - pass + def run(self): + return {} - def get_pods_for_component(self, execute_module, namespace, logging_component, task_vars): + def get_pods_for_component(self, namespace, logging_component): """Get all pods for a given component. Returns: list of pods for component, error string""" pod_output = self.exec_oc( - execute_module, namespace, "get pods -l component={} -o json".format(logging_component), [], - task_vars ) try: pods = json.loads(pod_output) @@ -45,7 +45,7 @@ def get_pods_for_component(self, execute_module, namespace, logging_component, t raise ValueError() except ValueError: # successful run but non-parsing data generally means there were no pods in the namespace - return None, 'There are no pods in the {} namespace. Is logging deployed?'.format(namespace) + return None, 'No pods were found for the "{}" logging component.'.format(logging_component) return pods['items'], None @@ -63,14 +63,13 @@ def not_running_pods(pods): ) ] - @staticmethod - def exec_oc(execute_module=None, namespace="logging", cmd_str="", extra_args=None, task_vars=None): + def exec_oc(self, namespace="logging", cmd_str="", extra_args=None): """ Execute an 'oc' command in the remote host. Returns: output of command and namespace, or raises OpenShiftCheckException on error """ - config_base = get_var(task_vars, "openshift", "common", "config_base") + config_base = self.get_var("openshift", "common", "config_base") args = { "namespace": namespace, "config_file": os.path.join(config_base, "master", "admin.kubeconfig"), @@ -78,7 +77,7 @@ def exec_oc(execute_module=None, namespace="logging", cmd_str="", extra_args=Non "extra_args": list(extra_args) if extra_args else [], } - result = execute_module("ocutil", args, None, task_vars) + result = self.execute_module("ocutil", args) if result.get("failed"): msg = ( 'Unexpected error using `oc` to validate the logging stack components.\n' diff --git a/roles/openshift_health_checker/openshift_checks/logging/logging_index_time.py b/roles/openshift_health_checker/openshift_checks/logging/logging_index_time.py new file mode 100644 index 00000000000..b24e88e056c --- /dev/null +++ b/roles/openshift_health_checker/openshift_checks/logging/logging_index_time.py @@ -0,0 +1,130 @@ +""" +Check for ensuring logs from pods can be queried in a reasonable amount of time. +""" + +import json +import time + +from uuid import uuid4 + +from openshift_checks import OpenShiftCheckException +from openshift_checks.logging.logging import LoggingCheck + + +ES_CMD_TIMEOUT_SECONDS = 30 + + +class LoggingIndexTime(LoggingCheck): + """Check that pod logs are aggregated and indexed in ElasticSearch within a reasonable amount of time.""" + name = "logging_index_time" + tags = ["health", "logging"] + + logging_namespace = "logging" + + def run(self): + """Add log entry by making unique request to Kibana. Check for unique entry in the ElasticSearch pod logs.""" + try: + log_index_timeout = int( + self.get_var("openshift_check_logging_index_timeout_seconds", default=ES_CMD_TIMEOUT_SECONDS) + ) + except ValueError: + return { + "failed": True, + "msg": ('Invalid value provided for "openshift_check_logging_index_timeout_seconds". ' + 'Value must be an integer representing an amount in seconds.'), + } + + running_component_pods = dict() + + # get all component pods + self.logging_namespace = self.get_var("openshift_logging_namespace", default=self.logging_namespace) + for component, name in (['kibana', 'Kibana'], ['es', 'Elasticsearch']): + pods, error = self.get_pods_for_component(self.logging_namespace, component) + + if error: + msg = 'Unable to retrieve pods for the {} logging component: {}' + return {"failed": True, "changed": False, "msg": msg.format(name, error)} + + running_pods = self.running_pods(pods) + + if not running_pods: + msg = ('No {} pods in the "Running" state were found.' + 'At least one pod is required in order to perform this check.') + return {"failed": True, "changed": False, "msg": msg.format(name)} + + running_component_pods[component] = running_pods + + uuid = self.curl_kibana_with_uuid(running_component_pods["kibana"][0]) + self.wait_until_cmd_or_err(running_component_pods["es"][0], uuid, log_index_timeout) + return {} + + def wait_until_cmd_or_err(self, es_pod, uuid, timeout_secs): + """Retry an Elasticsearch query every second until query success, or a defined + length of time has passed.""" + deadline = time.time() + timeout_secs + interval = 1 + while not self.query_es_from_es(es_pod, uuid): + if time.time() + interval > deadline: + msg = "expecting match in Elasticsearch for message with uuid {}, but no matches were found after {}s." + raise OpenShiftCheckException(msg.format(uuid, timeout_secs)) + time.sleep(interval) + + def curl_kibana_with_uuid(self, kibana_pod): + """curl Kibana with a unique uuid.""" + uuid = self.generate_uuid() + pod_name = kibana_pod["metadata"]["name"] + exec_cmd = "exec {pod_name} -c kibana -- curl --max-time 30 -s http://localhost:5601/{uuid}" + exec_cmd = exec_cmd.format(pod_name=pod_name, uuid=uuid) + + error_str = self.exec_oc(self.logging_namespace, exec_cmd, []) + + try: + error_code = json.loads(error_str)["statusCode"] + except KeyError: + msg = ('invalid response returned from Kibana request (Missing "statusCode" key):\n' + 'Command: {}\nResponse: {}').format(exec_cmd, error_str) + raise OpenShiftCheckException(msg) + except ValueError: + msg = ('invalid response returned from Kibana request (Non-JSON output):\n' + 'Command: {}\nResponse: {}').format(exec_cmd, error_str) + raise OpenShiftCheckException(msg) + + if error_code != 404: + msg = 'invalid error code returned from Kibana request. Expecting error code "404", but got "{}" instead.' + raise OpenShiftCheckException(msg.format(error_code)) + + return uuid + + def query_es_from_es(self, es_pod, uuid): + """curl the Elasticsearch pod and look for a unique uuid in its logs.""" + pod_name = es_pod["metadata"]["name"] + exec_cmd = ( + "exec {pod_name} -- curl --max-time 30 -s -f " + "--cacert /etc/elasticsearch/secret/admin-ca " + "--cert /etc/elasticsearch/secret/admin-cert " + "--key /etc/elasticsearch/secret/admin-key " + "https://logging-es:9200/project.{namespace}*/_count?q=message:{uuid}" + ) + exec_cmd = exec_cmd.format(pod_name=pod_name, namespace=self.logging_namespace, uuid=uuid) + result = self.exec_oc(self.logging_namespace, exec_cmd, []) + + try: + count = json.loads(result)["count"] + except KeyError: + msg = 'invalid response from Elasticsearch query:\n"{}"\nMissing "count" key:\n{}' + raise OpenShiftCheckException(msg.format(exec_cmd, result)) + except ValueError: + msg = 'invalid response from Elasticsearch query:\n"{}"\nNon-JSON output:\n{}' + raise OpenShiftCheckException(msg.format(exec_cmd, result)) + + return count + + @staticmethod + def running_pods(pods): + """Filter pods that are running.""" + return [pod for pod in pods if pod['status']['phase'] == 'Running'] + + @staticmethod + def generate_uuid(): + """Wrap uuid generator. Allows for testing with expected values.""" + return str(uuid4()) diff --git a/roles/openshift_health_checker/openshift_checks/memory_availability.py b/roles/openshift_health_checker/openshift_checks/memory_availability.py index f4e31065f19..765ba072d13 100644 --- a/roles/openshift_health_checker/openshift_checks/memory_availability.py +++ b/roles/openshift_health_checker/openshift_checks/memory_availability.py @@ -1,5 +1,5 @@ -# pylint: disable=missing-docstring -from openshift_checks import OpenShiftCheck, get_var +"""Check that recommended memory is available.""" +from openshift_checks import OpenShiftCheck MIB = 2**20 GIB = 2**30 @@ -21,19 +21,18 @@ class MemoryAvailability(OpenShiftCheck): # https://access.redhat.com/solutions/3006511 physical RAM is partly reserved from memtotal memtotal_adjustment = 1 * GIB - @classmethod - def is_active(cls, task_vars): + def is_active(self): """Skip hosts that do not have recommended memory requirements.""" - group_names = get_var(task_vars, "group_names", default=[]) - has_memory_recommendation = bool(set(group_names).intersection(cls.recommended_memory_bytes)) - return super(MemoryAvailability, cls).is_active(task_vars) and has_memory_recommendation + group_names = self.get_var("group_names", default=[]) + has_memory_recommendation = bool(set(group_names).intersection(self.recommended_memory_bytes)) + return super(MemoryAvailability, self).is_active() and has_memory_recommendation - def run(self, tmp, task_vars): - group_names = get_var(task_vars, "group_names") - total_memory_bytes = get_var(task_vars, "ansible_memtotal_mb") * MIB + def run(self): + group_names = self.get_var("group_names") + total_memory_bytes = self.get_var("ansible_memtotal_mb") * MIB recommended_min = max(self.recommended_memory_bytes.get(name, 0) for name in group_names) - configured_min = float(get_var(task_vars, "openshift_check_min_host_memory_gb", default=0)) * GIB + configured_min = float(self.get_var("openshift_check_min_host_memory_gb", default=0)) * GIB min_memory_bytes = configured_min or recommended_min if total_memory_bytes + self.memtotal_adjustment < min_memory_bytes: diff --git a/roles/openshift_health_checker/openshift_checks/mixins.py b/roles/openshift_health_checker/openshift_checks/mixins.py index 2cb2e21aada..3b2c64e6a09 100644 --- a/roles/openshift_health_checker/openshift_checks/mixins.py +++ b/roles/openshift_health_checker/openshift_checks/mixins.py @@ -2,19 +2,16 @@ Mixin classes meant to be used with subclasses of OpenShiftCheck. """ -from openshift_checks import get_var - class NotContainerizedMixin(object): """Mixin for checks that are only active when not in containerized mode.""" # permanent # pylint: disable=too-few-public-methods # Reason: The mixin is not intended to stand on its own as a class. - @classmethod - def is_active(cls, task_vars): + def is_active(self): """Only run on non-containerized hosts.""" - is_containerized = get_var(task_vars, "openshift", "common", "is_containerized") - return super(NotContainerizedMixin, cls).is_active(task_vars) and not is_containerized + is_containerized = self.get_var("openshift", "common", "is_containerized") + return super(NotContainerizedMixin, self).is_active() and not is_containerized class DockerHostMixin(object): @@ -22,28 +19,26 @@ class DockerHostMixin(object): dependencies = [] - @classmethod - def is_active(cls, task_vars): + def is_active(self): """Only run on hosts that depend on Docker.""" - is_containerized = get_var(task_vars, "openshift", "common", "is_containerized") - is_node = "nodes" in get_var(task_vars, "group_names", default=[]) - return super(DockerHostMixin, cls).is_active(task_vars) and (is_containerized or is_node) + is_containerized = self.get_var("openshift", "common", "is_containerized") + is_node = "nodes" in self.get_var("group_names", default=[]) + return super(DockerHostMixin, self).is_active() and (is_containerized or is_node) - def ensure_dependencies(self, task_vars): + def ensure_dependencies(self): """ Ensure that docker-related packages exist, but not on atomic hosts (which would not be able to install but should already have them). Returns: msg, failed, changed """ - if get_var(task_vars, "openshift", "common", "is_atomic"): + if self.get_var("openshift", "common", "is_atomic"): return "", False, False # NOTE: we would use the "package" module but it's actually an action plugin # and it's not clear how to invoke one of those. This is about the same anyway: result = self.execute_module( - get_var(task_vars, "ansible_pkg_mgr", default="yum"), + self.get_var("ansible_pkg_mgr", default="yum"), {"name": self.dependencies, "state": "present"}, - task_vars=task_vars, ) msg = result.get("msg", "") if result.get("failed"): diff --git a/roles/openshift_health_checker/openshift_checks/ovs_version.py b/roles/openshift_health_checker/openshift_checks/ovs_version.py index 2dd045f1f78..d5e55bc25bf 100644 --- a/roles/openshift_health_checker/openshift_checks/ovs_version.py +++ b/roles/openshift_health_checker/openshift_checks/ovs_version.py @@ -3,7 +3,7 @@ currently installed version of OpenShift. """ -from openshift_checks import OpenShiftCheck, OpenShiftCheckException, get_var +from openshift_checks import OpenShiftCheck, OpenShiftCheckException from openshift_checks.mixins import NotContainerizedMixin @@ -21,58 +21,34 @@ class OvsVersion(NotContainerizedMixin, OpenShiftCheck): "3.4": "2.4", } - # map major release versions across releases - # to a common major version - openshift_major_release_version = { - "1": "3", - } - - @classmethod - def is_active(cls, task_vars): + def is_active(self): """Skip hosts that do not have package requirements.""" - group_names = get_var(task_vars, "group_names", default=[]) + group_names = self.get_var("group_names", default=[]) master_or_node = 'masters' in group_names or 'nodes' in group_names - return super(OvsVersion, cls).is_active(task_vars) and master_or_node + return super(OvsVersion, self).is_active() and master_or_node - def run(self, tmp, task_vars): + def run(self): args = { "package_list": [ { "name": "openvswitch", - "version": self.get_required_ovs_version(task_vars), + "version": self.get_required_ovs_version(), }, ], } - return self.execute_module("rpm_version", args, task_vars=task_vars) + return self.execute_module("rpm_version", args) - def get_required_ovs_version(self, task_vars): + def get_required_ovs_version(self): """Return the correct Open vSwitch version for the current OpenShift version""" - openshift_version = self._get_openshift_version(task_vars) + openshift_version_tuple = self.get_major_minor_version(self.get_var("openshift_image_tag")) - if float(openshift_version) < 3.5: + if openshift_version_tuple < (3, 5): return self.openshift_to_ovs_version["3.4"] - ovs_version = self.openshift_to_ovs_version.get(str(openshift_version)) + openshift_version = ".".join(str(x) for x in openshift_version_tuple) + ovs_version = self.openshift_to_ovs_version.get(openshift_version) if ovs_version: - return self.openshift_to_ovs_version[str(openshift_version)] + return self.openshift_to_ovs_version[openshift_version] msg = "There is no recommended version of Open vSwitch for the current version of OpenShift: {}" raise OpenShiftCheckException(msg.format(openshift_version)) - - def _get_openshift_version(self, task_vars): - openshift_version = get_var(task_vars, "openshift_image_tag") - if openshift_version and openshift_version[0] == 'v': - openshift_version = openshift_version[1:] - - return self._parse_version(openshift_version) - - def _parse_version(self, version): - components = version.split(".") - if not components or len(components) < 2: - msg = "An invalid version of OpenShift was found for this host: {}" - raise OpenShiftCheckException(msg.format(version)) - - if components[0] in self.openshift_major_release_version: - components[0] = self.openshift_major_release_version[components[0]] - - return '.'.join(components[:2]) diff --git a/roles/openshift_health_checker/openshift_checks/package_availability.py b/roles/openshift_health_checker/openshift_checks/package_availability.py index 0dd2b12865f..a86180b002f 100644 --- a/roles/openshift_health_checker/openshift_checks/package_availability.py +++ b/roles/openshift_health_checker/openshift_checks/package_availability.py @@ -1,5 +1,6 @@ -# pylint: disable=missing-docstring -from openshift_checks import OpenShiftCheck, get_var +"""Check that required RPM packages are available.""" + +from openshift_checks import OpenShiftCheck from openshift_checks.mixins import NotContainerizedMixin @@ -9,13 +10,13 @@ class PackageAvailability(NotContainerizedMixin, OpenShiftCheck): name = "package_availability" tags = ["preflight"] - @classmethod - def is_active(cls, task_vars): - return super(PackageAvailability, cls).is_active(task_vars) and task_vars["ansible_pkg_mgr"] == "yum" + def is_active(self): + """Run only when yum is the package manager as the code is specific to it.""" + return super(PackageAvailability, self).is_active() and self.get_var("ansible_pkg_mgr") == "yum" - def run(self, tmp, task_vars): - rpm_prefix = get_var(task_vars, "openshift", "common", "service_type") - group_names = get_var(task_vars, "group_names", default=[]) + def run(self): + rpm_prefix = self.get_var("openshift", "common", "service_type") + group_names = self.get_var("group_names", default=[]) packages = set() @@ -25,10 +26,11 @@ def run(self, tmp, task_vars): packages.update(self.node_packages(rpm_prefix)) args = {"packages": sorted(set(packages))} - return self.execute_module("check_yum_update", args, tmp=tmp, task_vars=task_vars) + return self.execute_module("check_yum_update", args) @staticmethod def master_packages(rpm_prefix): + """Return a list of RPMs that we expect a master install to have available.""" return [ "{rpm_prefix}".format(rpm_prefix=rpm_prefix), "{rpm_prefix}-clients".format(rpm_prefix=rpm_prefix), @@ -44,6 +46,7 @@ def master_packages(rpm_prefix): @staticmethod def node_packages(rpm_prefix): + """Return a list of RPMs that we expect a node install to have available.""" return [ "{rpm_prefix}".format(rpm_prefix=rpm_prefix), "{rpm_prefix}-node".format(rpm_prefix=rpm_prefix), diff --git a/roles/openshift_health_checker/openshift_checks/package_update.py b/roles/openshift_health_checker/openshift_checks/package_update.py index f432380c610..1e9aecbe03d 100644 --- a/roles/openshift_health_checker/openshift_checks/package_update.py +++ b/roles/openshift_health_checker/openshift_checks/package_update.py @@ -1,14 +1,14 @@ -# pylint: disable=missing-docstring +"""Check that a yum update would not run into conflicts with available packages.""" from openshift_checks import OpenShiftCheck from openshift_checks.mixins import NotContainerizedMixin class PackageUpdate(NotContainerizedMixin, OpenShiftCheck): - """Check that there are no conflicts in RPM packages.""" + """Check that a yum update would not run into conflicts with available packages.""" name = "package_update" tags = ["preflight"] - def run(self, tmp, task_vars): + def run(self): args = {"packages": []} - return self.execute_module("check_yum_update", args, tmp=tmp, task_vars=task_vars) + return self.execute_module("check_yum_update", args) diff --git a/roles/openshift_health_checker/openshift_checks/package_version.py b/roles/openshift_health_checker/openshift_checks/package_version.py index 204752bd008..8b780114fc7 100644 --- a/roles/openshift_health_checker/openshift_checks/package_version.py +++ b/roles/openshift_health_checker/openshift_checks/package_version.py @@ -1,5 +1,8 @@ -# pylint: disable=missing-docstring -from openshift_checks import OpenShiftCheck, OpenShiftCheckException, get_var +"""Check that available RPM packages match the required versions.""" + +import re + +from openshift_checks import OpenShiftCheck, OpenShiftCheckException from openshift_checks.mixins import NotContainerizedMixin @@ -9,48 +12,49 @@ class PackageVersion(NotContainerizedMixin, OpenShiftCheck): name = "package_version" tags = ["preflight"] + # NOTE: versions outside those specified are mapped to least/greatest openshift_to_ovs_version = { - "3.6": ["2.6", "2.7"], - "3.5": ["2.6", "2.7"], - "3.4": "2.4", + (3, 4): "2.4", + (3, 5): ["2.6", "2.7"], + (3, 6): ["2.6", "2.7"], } openshift_to_docker_version = { - "3.1": "1.8", - "3.2": "1.10", - "3.3": "1.10", - "3.4": "1.12", + (3, 1): "1.8", + (3, 2): "1.10", + (3, 3): "1.10", + (3, 4): "1.12", + (3, 5): "1.12", + (3, 6): "1.12", } - # map major release versions across releases - # to a common major version - openshift_major_release_version = { - "1": "3", + # map major OpenShift release versions across releases to a common major version + map_major_release_version = { + 1: 3, } - @classmethod - def is_active(cls, task_vars): + def is_active(self): """Skip hosts that do not have package requirements.""" - group_names = get_var(task_vars, "group_names", default=[]) + group_names = self.get_var("group_names", default=[]) master_or_node = 'masters' in group_names or 'nodes' in group_names - return super(PackageVersion, cls).is_active(task_vars) and master_or_node + return super(PackageVersion, self).is_active() and master_or_node - def run(self, tmp, task_vars): - rpm_prefix = get_var(task_vars, "openshift", "common", "service_type") - openshift_release = get_var(task_vars, "openshift_release", default='') - deployment_type = get_var(task_vars, "openshift_deployment_type") + def run(self): + rpm_prefix = self.get_var("openshift", "common", "service_type") + openshift_release = self.get_var("openshift_release", default='') + deployment_type = self.get_var("openshift_deployment_type") check_multi_minor_release = deployment_type in ['openshift-enterprise'] args = { "package_list": [ { "name": "openvswitch", - "version": self.get_required_ovs_version(task_vars), + "version": self.get_required_ovs_version(), "check_multi": False, }, { "name": "docker", - "version": self.get_required_docker_version(task_vars), + "version": self.get_required_docker_version(), "check_multi": False, }, { @@ -71,55 +75,52 @@ def run(self, tmp, task_vars): ], } - return self.execute_module("aos_version", args, tmp=tmp, task_vars=task_vars) + return self.execute_module("aos_version", args) - def get_required_ovs_version(self, task_vars): - """Return the correct Open vSwitch version for the current OpenShift version. - If the current OpenShift version is >= 3.5, ensure Open vSwitch version 2.6, - Else ensure Open vSwitch version 2.4""" - openshift_version = self.get_openshift_version(task_vars) + def get_required_ovs_version(self): + """Return the correct Open vSwitch version(s) for the current OpenShift version.""" + openshift_version = self.get_openshift_version_tuple() - if float(openshift_version) < 3.5: - return self.openshift_to_ovs_version["3.4"] + earliest = min(self.openshift_to_ovs_version) + latest = max(self.openshift_to_ovs_version) + if openshift_version < earliest: + return self.openshift_to_ovs_version[earliest] + if openshift_version > latest: + return self.openshift_to_ovs_version[latest] - ovs_version = self.openshift_to_ovs_version.get(str(openshift_version)) - if ovs_version: - return ovs_version + ovs_version = self.openshift_to_ovs_version.get(openshift_version) + if not ovs_version: + msg = "There is no recommended version of Open vSwitch for the current version of OpenShift: {}" + raise OpenShiftCheckException(msg.format(".".join(str(comp) for comp in openshift_version))) - msg = "There is no recommended version of Open vSwitch for the current version of OpenShift: {}" - raise OpenShiftCheckException(msg.format(openshift_version)) + return ovs_version - def get_required_docker_version(self, task_vars): - """Return the correct Docker version for the current OpenShift version. - If the OpenShift version is 3.1, ensure Docker version 1.8. - If the OpenShift version is 3.2 or 3.3, ensure Docker version 1.10. - If the current OpenShift version is >= 3.4, ensure Docker version 1.12.""" - openshift_version = self.get_openshift_version(task_vars) + def get_required_docker_version(self): + """Return the correct Docker version(s) for the current OpenShift version.""" + openshift_version = self.get_openshift_version_tuple() - if float(openshift_version) >= 3.4: - return self.openshift_to_docker_version["3.4"] + earliest = min(self.openshift_to_docker_version) + latest = max(self.openshift_to_docker_version) + if openshift_version < earliest: + return self.openshift_to_docker_version[earliest] + if openshift_version > latest: + return self.openshift_to_docker_version[latest] - docker_version = self.openshift_to_docker_version.get(str(openshift_version)) - if docker_version: - return docker_version + docker_version = self.openshift_to_docker_version.get(openshift_version) + if not docker_version: + msg = "There is no recommended version of Docker for the current version of OpenShift: {}" + raise OpenShiftCheckException(msg.format(".".join(str(comp) for comp in openshift_version))) - msg = "There is no recommended version of Docker for the current version of OpenShift: {}" - raise OpenShiftCheckException(msg.format(openshift_version)) + return docker_version - def get_openshift_version(self, task_vars): - openshift_version = get_var(task_vars, "openshift_image_tag") - if openshift_version and openshift_version[0] == 'v': - openshift_version = openshift_version[1:] + def get_openshift_version_tuple(self): + """Return received image tag as a normalized (X, Y) minor version tuple.""" + version = self.get_var("openshift_image_tag") + comps = [int(component) for component in re.findall(r'\d+', version)] - return self.parse_version(openshift_version) - - def parse_version(self, version): - components = version.split(".") - if not components or len(components) < 2: + if len(comps) < 2: msg = "An invalid version of OpenShift was found for this host: {}" raise OpenShiftCheckException(msg.format(version)) - if components[0] in self.openshift_major_release_version: - components[0] = self.openshift_major_release_version[components[0]] - - return '.'.join(components[:2]) + comps[0] = self.map_major_release_version.get(comps[0], comps[0]) + return tuple(comps[0:2]) diff --git a/roles/openshift_health_checker/test/action_plugin_test.py b/roles/openshift_health_checker/test/action_plugin_test.py index 9383b233c7e..2d068be3dfc 100644 --- a/roles/openshift_health_checker/test/action_plugin_test.py +++ b/roles/openshift_health_checker/test/action_plugin_test.py @@ -15,14 +15,13 @@ class FakeCheck(object): name = _name tags = _tags or [] - def __init__(self, execute_module=None): + def __init__(self, execute_module=None, task_vars=None, tmp=None): pass - @classmethod - def is_active(cls, task_vars): + def is_active(self): return is_active - def run(self, tmp, task_vars): + def run(self): if run_exception is not None: raise run_exception return run_return @@ -124,7 +123,7 @@ def test_action_plugin_skip_disabled_checks(plugin, task_vars, monkeypatch): def test_action_plugin_run_check_ok(plugin, task_vars, monkeypatch): check_return_value = {'ok': 'test'} check_class = fake_check(run_return=check_return_value) - monkeypatch.setattr(plugin, 'load_known_checks', lambda: {'fake_check': check_class()}) + monkeypatch.setattr(plugin, 'load_known_checks', lambda tmp, task_vars: {'fake_check': check_class()}) monkeypatch.setattr('openshift_health_check.resolve_checks', lambda *args: ['fake_check']) result = plugin.run(tmp=None, task_vars=task_vars) @@ -138,7 +137,7 @@ def test_action_plugin_run_check_ok(plugin, task_vars, monkeypatch): def test_action_plugin_run_check_changed(plugin, task_vars, monkeypatch): check_return_value = {'ok': 'test', 'changed': True} check_class = fake_check(run_return=check_return_value) - monkeypatch.setattr(plugin, 'load_known_checks', lambda: {'fake_check': check_class()}) + monkeypatch.setattr(plugin, 'load_known_checks', lambda tmp, task_vars: {'fake_check': check_class()}) monkeypatch.setattr('openshift_health_check.resolve_checks', lambda *args: ['fake_check']) result = plugin.run(tmp=None, task_vars=task_vars) @@ -152,7 +151,7 @@ def test_action_plugin_run_check_changed(plugin, task_vars, monkeypatch): def test_action_plugin_run_check_fail(plugin, task_vars, monkeypatch): check_return_value = {'failed': True} check_class = fake_check(run_return=check_return_value) - monkeypatch.setattr(plugin, 'load_known_checks', lambda: {'fake_check': check_class()}) + monkeypatch.setattr(plugin, 'load_known_checks', lambda tmp, task_vars: {'fake_check': check_class()}) monkeypatch.setattr('openshift_health_check.resolve_checks', lambda *args: ['fake_check']) result = plugin.run(tmp=None, task_vars=task_vars) @@ -167,7 +166,7 @@ def test_action_plugin_run_check_exception(plugin, task_vars, monkeypatch): exception_msg = 'fake check has an exception' run_exception = OpenShiftCheckException(exception_msg) check_class = fake_check(run_exception=run_exception) - monkeypatch.setattr(plugin, 'load_known_checks', lambda: {'fake_check': check_class()}) + monkeypatch.setattr(plugin, 'load_known_checks', lambda tmp, task_vars: {'fake_check': check_class()}) monkeypatch.setattr('openshift_health_check.resolve_checks', lambda *args: ['fake_check']) result = plugin.run(tmp=None, task_vars=task_vars) @@ -179,7 +178,7 @@ def test_action_plugin_run_check_exception(plugin, task_vars, monkeypatch): def test_action_plugin_resolve_checks_exception(plugin, task_vars, monkeypatch): - monkeypatch.setattr(plugin, 'load_known_checks', lambda: {}) + monkeypatch.setattr(plugin, 'load_known_checks', lambda tmp, task_vars: {}) result = plugin.run(tmp=None, task_vars=task_vars) diff --git a/roles/openshift_health_checker/test/disk_availability_test.py b/roles/openshift_health_checker/test/disk_availability_test.py index 945b9eafc8a..e98d02c583b 100644 --- a/roles/openshift_health_checker/test/disk_availability_test.py +++ b/roles/openshift_health_checker/test/disk_availability_test.py @@ -17,7 +17,7 @@ def test_is_active(group_names, is_active): task_vars = dict( group_names=group_names, ) - assert DiskAvailability.is_active(task_vars=task_vars) == is_active + assert DiskAvailability(None, task_vars).is_active() == is_active @pytest.mark.parametrize('ansible_mounts,extra_words', [ @@ -30,10 +30,9 @@ def test_cannot_determine_available_disk(ansible_mounts, extra_words): group_names=['masters'], ansible_mounts=ansible_mounts, ) - check = DiskAvailability(execute_module=fake_execute_module) with pytest.raises(OpenShiftCheckException) as excinfo: - check.run(tmp=None, task_vars=task_vars) + DiskAvailability(fake_execute_module, task_vars).run() for word in 'determine disk availability'.split() + extra_words: assert word in str(excinfo.value) @@ -93,8 +92,7 @@ def test_succeeds_with_recommended_disk_space(group_names, configured_min, ansib ansible_mounts=ansible_mounts, ) - check = DiskAvailability(execute_module=fake_execute_module) - result = check.run(tmp=None, task_vars=task_vars) + result = DiskAvailability(fake_execute_module, task_vars).run() assert not result.get('failed', False) @@ -168,8 +166,7 @@ def test_fails_with_insufficient_disk_space(group_names, configured_min, ansible ansible_mounts=ansible_mounts, ) - check = DiskAvailability(execute_module=fake_execute_module) - result = check.run(tmp=None, task_vars=task_vars) + result = DiskAvailability(fake_execute_module, task_vars).run() assert result['failed'] for word in 'below recommended'.split() + extra_words: diff --git a/roles/openshift_health_checker/test/docker_image_availability_test.py b/roles/openshift_health_checker/test/docker_image_availability_test.py index 3b9e097fb65..8d0a53df971 100644 --- a/roles/openshift_health_checker/test/docker_image_availability_test.py +++ b/roles/openshift_health_checker/test/docker_image_availability_test.py @@ -21,7 +21,7 @@ def test_is_active(deployment_type, is_containerized, group_names, expect_active openshift_deployment_type=deployment_type, group_names=group_names, ) - assert DockerImageAvailability.is_active(task_vars=task_vars) == expect_active + assert DockerImageAvailability(None, task_vars).is_active() == expect_active @pytest.mark.parametrize("is_containerized,is_atomic", [ @@ -31,7 +31,7 @@ def test_is_active(deployment_type, is_containerized, group_names, expect_active (False, True), ]) def test_all_images_available_locally(is_containerized, is_atomic): - def execute_module(module_name, module_args, task_vars): + def execute_module(module_name, module_args, *_): if module_name == "yum": return {"changed": True} @@ -42,7 +42,7 @@ def execute_module(module_name, module_args, task_vars): 'images': [module_args['name']], } - result = DockerImageAvailability(execute_module=execute_module).run(tmp=None, task_vars=dict( + result = DockerImageAvailability(execute_module, task_vars=dict( openshift=dict( common=dict( service_type='origin', @@ -54,7 +54,7 @@ def execute_module(module_name, module_args, task_vars): openshift_deployment_type='origin', openshift_image_tag='3.4', group_names=['nodes', 'masters'], - )) + )).run() assert not result.get('failed', False) @@ -64,12 +64,12 @@ def execute_module(module_name, module_args, task_vars): True, ]) def test_all_images_available_remotely(available_locally): - def execute_module(module_name, module_args, task_vars): + def execute_module(module_name, *_): if module_name == 'docker_image_facts': return {'images': [], 'failed': available_locally} return {'changed': False} - result = DockerImageAvailability(execute_module=execute_module).run(tmp=None, task_vars=dict( + result = DockerImageAvailability(execute_module, task_vars=dict( openshift=dict( common=dict( service_type='origin', @@ -81,13 +81,13 @@ def execute_module(module_name, module_args, task_vars): openshift_deployment_type='origin', openshift_image_tag='v3.4', group_names=['nodes', 'masters'], - )) + )).run() assert not result.get('failed', False) def test_all_images_unavailable(): - def execute_module(module_name=None, module_args=None, tmp=None, task_vars=None): + def execute_module(module_name=None, *_): if module_name == "command": return { 'failed': True, @@ -97,8 +97,7 @@ def execute_module(module_name=None, module_args=None, tmp=None, task_vars=None) 'changed': False, } - check = DockerImageAvailability(execute_module=execute_module) - actual = check.run(tmp=None, task_vars=dict( + actual = DockerImageAvailability(execute_module, task_vars=dict( openshift=dict( common=dict( service_type='origin', @@ -110,7 +109,7 @@ def execute_module(module_name=None, module_args=None, tmp=None, task_vars=None) openshift_deployment_type="openshift-enterprise", openshift_image_tag='latest', group_names=['nodes', 'masters'], - )) + )).run() assert actual['failed'] assert "required Docker images are not available" in actual['msg'] @@ -127,7 +126,7 @@ def execute_module(module_name=None, module_args=None, tmp=None, task_vars=None) ), ]) def test_skopeo_update_failure(message, extra_words): - def execute_module(module_name=None, module_args=None, tmp=None, task_vars=None): + def execute_module(module_name=None, *_): if module_name == "yum": return { "failed": True, @@ -137,7 +136,7 @@ def execute_module(module_name=None, module_args=None, tmp=None, task_vars=None) return {'changed': False} - actual = DockerImageAvailability(execute_module=execute_module).run(tmp=None, task_vars=dict( + actual = DockerImageAvailability(execute_module, task_vars=dict( openshift=dict( common=dict( service_type='origin', @@ -149,7 +148,7 @@ def execute_module(module_name=None, module_args=None, tmp=None, task_vars=None) openshift_deployment_type="openshift-enterprise", openshift_image_tag='', group_names=['nodes', 'masters'], - )) + )).run() assert actual["failed"] for word in extra_words: @@ -162,12 +161,12 @@ def execute_module(module_name=None, module_args=None, tmp=None, task_vars=None) ("openshift-enterprise", []), ]) def test_registry_availability(deployment_type, registries): - def execute_module(module_name=None, module_args=None, tmp=None, task_vars=None): + def execute_module(module_name=None, *_): return { 'changed': False, } - actual = DockerImageAvailability(execute_module=execute_module).run(tmp=None, task_vars=dict( + actual = DockerImageAvailability(execute_module, task_vars=dict( openshift=dict( common=dict( service_type='origin', @@ -179,7 +178,7 @@ def execute_module(module_name=None, module_args=None, tmp=None, task_vars=None) openshift_deployment_type=deployment_type, openshift_image_tag='', group_names=['nodes', 'masters'], - )) + )).run() assert not actual.get("failed", False) @@ -258,7 +257,7 @@ def test_required_images(deployment_type, is_containerized, groups, oreg_url, ex openshift_image_tag='vtest', ) - assert expected == DockerImageAvailability("DUMMY").required_images(task_vars) + assert expected == DockerImageAvailability("DUMMY", task_vars).required_images() def test_containerized_etcd(): @@ -272,4 +271,4 @@ def test_containerized_etcd(): group_names=['etcd'], ) expected = set(['registry.access.redhat.com/rhel7/etcd']) - assert expected == DockerImageAvailability("DUMMY").required_images(task_vars) + assert expected == DockerImageAvailability("DUMMY", task_vars).required_images() diff --git a/roles/openshift_health_checker/test/docker_storage_test.py b/roles/openshift_health_checker/test/docker_storage_test.py index 99c52905495..e0dccc062a6 100644 --- a/roles/openshift_health_checker/test/docker_storage_test.py +++ b/roles/openshift_health_checker/test/docker_storage_test.py @@ -4,12 +4,6 @@ from openshift_checks.docker_storage import DockerStorage -def dummy_check(execute_module=None): - def dummy_exec(self, status, task_vars): - raise Exception("dummy executor called") - return DockerStorage(execute_module=execute_module or dummy_exec) - - @pytest.mark.parametrize('is_containerized, group_names, is_active', [ (False, ["masters", "etcd"], False), (False, ["masters", "nodes"], True), @@ -20,7 +14,7 @@ def test_is_active(is_containerized, group_names, is_active): openshift=dict(common=dict(is_containerized=is_containerized)), group_names=group_names, ) - assert DockerStorage.is_active(task_vars=task_vars) == is_active + assert DockerStorage(None, task_vars).is_active() == is_active def non_atomic_task_vars(): @@ -99,17 +93,17 @@ def non_atomic_task_vars(): ), ]) def test_check_storage_driver(docker_info, failed, expect_msg): - def execute_module(module_name, module_args, tmp=None, task_vars=None): + def execute_module(module_name, *_): if module_name == "yum": return {} if module_name != "docker_info": raise ValueError("not expecting module " + module_name) return docker_info - check = dummy_check(execute_module=execute_module) - check.check_dm_usage = lambda status, task_vars: dict() # stub out for this test - check.check_overlay_usage = lambda info, task_vars: dict() # stub out for this test - result = check.run(tmp=None, task_vars=non_atomic_task_vars()) + check = DockerStorage(execute_module, non_atomic_task_vars()) + check.check_dm_usage = lambda status: dict() # stub out for this test + check.check_overlay_usage = lambda info: dict() # stub out for this test + result = check.run() if failed: assert result["failed"] @@ -168,9 +162,9 @@ def execute_module(module_name, module_args, tmp=None, task_vars=None): ), ]) def test_dm_usage(task_vars, driver_status, vg_free, success, expect_msg): - check = dummy_check() - check.get_vg_free = lambda pool, task_vars: vg_free - result = check.check_dm_usage(driver_status, task_vars) + check = DockerStorage(None, task_vars) + check.get_vg_free = lambda pool: vg_free + result = check.check_dm_usage(driver_status) result_success = not result.get("failed") assert result_success is success @@ -210,18 +204,18 @@ def test_dm_usage(task_vars, driver_status, vg_free, success, expect_msg): ) ]) def test_vg_free(pool, command_returns, raises, returns): - def execute_module(module_name, module_args, tmp=None, task_vars=None): + def execute_module(module_name, *_): if module_name != "command": raise ValueError("not expecting module " + module_name) return command_returns - check = dummy_check(execute_module=execute_module) + check = DockerStorage(execute_module) if raises: with pytest.raises(OpenShiftCheckException) as err: - check.get_vg_free(pool, {}) + check.get_vg_free(pool) assert raises in str(err.value) else: - ret = check.get_vg_free(pool, {}) + ret = check.get_vg_free(pool) assert ret == returns @@ -298,13 +292,13 @@ def test_convert_to_bytes_error(string): ), ]) def test_overlay_usage(ansible_mounts, threshold, expect_fail, expect_msg): - check = dummy_check() task_vars = non_atomic_task_vars() task_vars["ansible_mounts"] = ansible_mounts if threshold is not None: task_vars["max_overlay_usage_percent"] = threshold + check = DockerStorage(None, task_vars) docker_info = dict(DockerRootDir="/var/lib/docker", Driver="overlay") - result = check.check_overlay_usage(docker_info, task_vars) + result = check.check_overlay_usage(docker_info) assert expect_fail == bool(result.get("failed")) for msg in expect_msg: diff --git a/roles/openshift_health_checker/test/elasticsearch_test.py b/roles/openshift_health_checker/test/elasticsearch_test.py index b9d375d8c82..67408609a93 100644 --- a/roles/openshift_health_checker/test/elasticsearch_test.py +++ b/roles/openshift_health_checker/test/elasticsearch_test.py @@ -6,14 +6,6 @@ task_vars_config_base = dict(openshift=dict(common=dict(config_base='/etc/origin'))) -def canned_elasticsearch(exec_oc=None): - """Create an Elasticsearch check object with canned exec_oc method""" - check = Elasticsearch("dummy") # fails if a module is actually invoked - if exec_oc: - check._exec_oc = exec_oc - return check - - def assert_error(error, expect_error): if expect_error: assert error @@ -50,10 +42,10 @@ def assert_error(error, expect_error): def test_check_elasticsearch(): - assert 'No logging Elasticsearch pods' in canned_elasticsearch().check_elasticsearch([], {}) + assert 'No logging Elasticsearch pods' in Elasticsearch().check_elasticsearch([]) # canned oc responses to match so all the checks pass - def _exec_oc(cmd, args, task_vars): + def _exec_oc(ns, cmd, args): if '_cat/master' in cmd: return 'name logging-es' elif '/_nodes' in cmd: @@ -65,7 +57,9 @@ def _exec_oc(cmd, args, task_vars): else: raise Exception(cmd) - assert not canned_elasticsearch(_exec_oc).check_elasticsearch([plain_es_pod], {}) + check = Elasticsearch(None, {}) + check.exec_oc = _exec_oc + assert not check.check_elasticsearch([plain_es_pod]) def pods_by_name(pods): @@ -88,9 +82,9 @@ def pods_by_name(pods): ]) def test_check_elasticsearch_masters(pods, expect_error): test_pods = list(pods) - check = canned_elasticsearch(lambda cmd, args, task_vars: test_pods.pop(0)['_test_master_name_str']) - - errors = check._check_elasticsearch_masters(pods_by_name(pods), task_vars_config_base) + check = Elasticsearch(None, task_vars_config_base) + check.execute_module = lambda cmd, args: {'result': test_pods.pop(0)['_test_master_name_str']} + errors = check._check_elasticsearch_masters(pods_by_name(pods)) assert_error(''.join(errors), expect_error) @@ -124,9 +118,10 @@ def test_check_elasticsearch_masters(pods, expect_error): ), ]) def test_check_elasticsearch_node_list(pods, node_list, expect_error): - check = canned_elasticsearch(lambda cmd, args, task_vars: json.dumps(node_list)) + check = Elasticsearch(None, task_vars_config_base) + check.execute_module = lambda cmd, args: {'result': json.dumps(node_list)} - errors = check._check_elasticsearch_node_list(pods_by_name(pods), task_vars_config_base) + errors = check._check_elasticsearch_node_list(pods_by_name(pods)) assert_error(''.join(errors), expect_error) @@ -149,9 +144,10 @@ def test_check_elasticsearch_node_list(pods, node_list, expect_error): ]) def test_check_elasticsearch_cluster_health(pods, health_data, expect_error): test_health_data = list(health_data) - check = canned_elasticsearch(lambda cmd, args, task_vars: json.dumps(test_health_data.pop(0))) + check = Elasticsearch(None, task_vars_config_base) + check.execute_module = lambda cmd, args: {'result': json.dumps(test_health_data.pop(0))} - errors = check._check_es_cluster_health(pods_by_name(pods), task_vars_config_base) + errors = check._check_es_cluster_health(pods_by_name(pods)) assert_error(''.join(errors), expect_error) @@ -174,7 +170,8 @@ def test_check_elasticsearch_cluster_health(pods, health_data, expect_error): ), ]) def test_check_elasticsearch_diskspace(disk_data, expect_error): - check = canned_elasticsearch(lambda cmd, args, task_vars: disk_data) + check = Elasticsearch(None, task_vars_config_base) + check.execute_module = lambda cmd, args: {'result': disk_data} - errors = check._check_elasticsearch_diskspace(pods_by_name([plain_es_pod]), task_vars_config_base) + errors = check._check_elasticsearch_diskspace(pods_by_name([plain_es_pod])) assert_error(''.join(errors), expect_error) diff --git a/roles/openshift_health_checker/test/etcd_imagedata_size_test.py b/roles/openshift_health_checker/test/etcd_imagedata_size_test.py index df9d52d416e..e3d6706fa91 100644 --- a/roles/openshift_health_checker/test/etcd_imagedata_size_test.py +++ b/roles/openshift_health_checker/test/etcd_imagedata_size_test.py @@ -51,10 +51,10 @@ def test_cannot_determine_available_mountpath(ansible_mounts, extra_words): task_vars = dict( ansible_mounts=ansible_mounts, ) - check = EtcdImageDataSize(execute_module=fake_execute_module) + check = EtcdImageDataSize(fake_execute_module, task_vars) with pytest.raises(OpenShiftCheckException) as excinfo: - check.run(tmp=None, task_vars=task_vars) + check.run() for word in 'determine valid etcd mountpath'.split() + extra_words: assert word in str(excinfo.value) @@ -111,14 +111,14 @@ def test_cannot_determine_available_mountpath(ansible_mounts, extra_words): ) ]) def test_check_etcd_key_size_calculates_correct_limit(ansible_mounts, tree, size_limit, should_fail, extra_words): - def execute_module(module_name, args, tmp=None, task_vars=None): + def execute_module(module_name, module_args, *_): if module_name != "etcdkeysize": return { "changed": False, } client = fake_etcd_client(tree) - s, limit_exceeded = check_etcd_key_size(client, tree["key"], args["size_limit_bytes"]) + s, limit_exceeded = check_etcd_key_size(client, tree["key"], module_args["size_limit_bytes"]) return {"size_limit_exceeded": limit_exceeded} @@ -133,7 +133,7 @@ def execute_module(module_name, args, tmp=None, task_vars=None): if size_limit is None: task_vars.pop("etcd_max_image_data_size_bytes") - check = EtcdImageDataSize(execute_module=execute_module).run(tmp=None, task_vars=task_vars) + check = EtcdImageDataSize(execute_module, task_vars).run() if should_fail: assert check["failed"] @@ -267,14 +267,14 @@ def execute_module(module_name, args, tmp=None, task_vars=None): ), ]) def test_etcd_key_size_check_calculates_correct_size(ansible_mounts, tree, root_path, expected_size, extra_words): - def execute_module(module_name, args, tmp=None, task_vars=None): + def execute_module(module_name, module_args, *_): if module_name != "etcdkeysize": return { "changed": False, } client = fake_etcd_client(tree) - size, limit_exceeded = check_etcd_key_size(client, root_path, args["size_limit_bytes"]) + size, limit_exceeded = check_etcd_key_size(client, root_path, module_args["size_limit_bytes"]) assert size == expected_size return { @@ -289,12 +289,12 @@ def execute_module(module_name, args, tmp=None, task_vars=None): ) ) - check = EtcdImageDataSize(execute_module=execute_module).run(tmp=None, task_vars=task_vars) + check = EtcdImageDataSize(execute_module, task_vars).run() assert not check.get("failed", False) def test_etcdkeysize_module_failure(): - def execute_module(module_name, tmp=None, task_vars=None): + def execute_module(module_name, *_): if module_name != "etcdkeysize": return { "changed": False, @@ -317,7 +317,7 @@ def execute_module(module_name, tmp=None, task_vars=None): ) ) - check = EtcdImageDataSize(execute_module=execute_module).run(tmp=None, task_vars=task_vars) + check = EtcdImageDataSize(execute_module, task_vars).run() assert check["failed"] for word in "Failed to retrieve stats": diff --git a/roles/openshift_health_checker/test/etcd_traffic_test.py b/roles/openshift_health_checker/test/etcd_traffic_test.py new file mode 100644 index 00000000000..f4316c4234b --- /dev/null +++ b/roles/openshift_health_checker/test/etcd_traffic_test.py @@ -0,0 +1,74 @@ +import pytest + +from openshift_checks.etcd_traffic import EtcdTraffic + + +@pytest.mark.parametrize('group_names,version,is_active', [ + (['masters'], "3.5", False), + (['masters'], "3.6", False), + (['nodes'], "3.4", False), + (['etcd'], "3.4", True), + (['etcd'], "3.5", True), + (['etcd'], "3.1", False), + (['masters', 'nodes'], "3.5", False), + (['masters', 'etcd'], "3.5", True), + ([], "3.4", False), +]) +def test_is_active(group_names, version, is_active): + task_vars = dict( + group_names=group_names, + openshift=dict( + common=dict(short_version=version), + ), + ) + assert EtcdTraffic(task_vars=task_vars).is_active() == is_active + + +@pytest.mark.parametrize('group_names,matched,failed,extra_words', [ + (["masters"], True, True, ["Higher than normal", "traffic"]), + (["masters", "etcd"], False, False, []), + (["etcd"], False, False, []), +]) +def test_log_matches_high_traffic_msg(group_names, matched, failed, extra_words): + def execute_module(module_name, *_): + return { + "matched": matched, + "failed": failed, + } + + task_vars = dict( + group_names=group_names, + openshift=dict( + common=dict(service_type="origin", is_containerized=False), + ) + ) + + result = EtcdTraffic(execute_module, task_vars).run() + + for word in extra_words: + assert word in result.get("msg", "") + + assert result.get("failed", False) == failed + + +@pytest.mark.parametrize('is_containerized,expected_unit_value', [ + (False, "etcd"), + (True, "etcd_container"), +]) +def test_systemd_unit_matches_deployment_type(is_containerized, expected_unit_value): + task_vars = dict( + openshift=dict( + common=dict(is_containerized=is_containerized), + ) + ) + + def execute_module(module_name, args, *_): + assert module_name == "search_journalctl" + matchers = args["log_matchers"] + + for matcher in matchers: + assert matcher["unit"] == expected_unit_value + + return {"failed": False} + + EtcdTraffic(execute_module, task_vars).run() diff --git a/roles/openshift_health_checker/test/etcd_volume_test.py b/roles/openshift_health_checker/test/etcd_volume_test.py index 91704552666..0b255136ee1 100644 --- a/roles/openshift_health_checker/test/etcd_volume_test.py +++ b/roles/openshift_health_checker/test/etcd_volume_test.py @@ -11,10 +11,9 @@ def test_cannot_determine_available_disk(ansible_mounts, extra_words): task_vars = dict( ansible_mounts=ansible_mounts, ) - check = EtcdVolume(execute_module=fake_execute_module) with pytest.raises(OpenShiftCheckException) as excinfo: - check.run(tmp=None, task_vars=task_vars) + EtcdVolume(fake_execute_module, task_vars).run() for word in 'Unable to find etcd storage mount point'.split() + extra_words: assert word in str(excinfo.value) @@ -76,8 +75,7 @@ def test_succeeds_with_recommended_disk_space(size_limit, ansible_mounts): if task_vars["etcd_device_usage_threshold_percent"] is None: task_vars.pop("etcd_device_usage_threshold_percent") - check = EtcdVolume(execute_module=fake_execute_module) - result = check.run(tmp=None, task_vars=task_vars) + result = EtcdVolume(fake_execute_module, task_vars).run() assert not result.get('failed', False) @@ -137,8 +135,7 @@ def test_fails_with_insufficient_disk_space(size_limit_percent, ansible_mounts, if task_vars["etcd_device_usage_threshold_percent"] is None: task_vars.pop("etcd_device_usage_threshold_percent") - check = EtcdVolume(execute_module=fake_execute_module) - result = check.run(tmp=None, task_vars=task_vars) + result = EtcdVolume(fake_execute_module, task_vars).run() assert result['failed'] for word in extra_words: diff --git a/roles/openshift_health_checker/test/fluentd_config_test.py b/roles/openshift_health_checker/test/fluentd_config_test.py new file mode 100644 index 00000000000..8a2d8b72bae --- /dev/null +++ b/roles/openshift_health_checker/test/fluentd_config_test.py @@ -0,0 +1,357 @@ +import pytest + +from openshift_checks.logging.fluentd_config import FluentdConfig, OpenShiftCheckException + + +def canned_fluentd_pod(containers): + return { + "metadata": { + "labels": {"component": "fluentd", "deploymentconfig": "logging-fluentd"}, + "name": "logging-fluentd-1", + }, + "spec": { + "host": "node1", + "nodeName": "node1", + "containers": containers, + }, + "status": { + "phase": "Running", + "containerStatuses": [{"ready": True}], + "conditions": [{"status": "True", "type": "Ready"}], + } + } + + +fluentd_pod = { + "metadata": { + "labels": {"component": "fluentd", "deploymentconfig": "logging-fluentd"}, + "name": "logging-fluentd-1", + }, + "spec": { + "host": "node1", + "nodeName": "node1", + "containers": [ + { + "name": "container1", + "env": [ + { + "name": "USE_JOURNAL", + "value": "true", + } + ], + } + ], + }, + "status": { + "phase": "Running", + "containerStatuses": [{"ready": True}], + "conditions": [{"status": "True", "type": "Ready"}], + } +} + +not_running_fluentd_pod = { + "metadata": { + "labels": {"component": "fluentd", "deploymentconfig": "logging-fluentd"}, + "name": "logging-fluentd-2", + }, + "status": { + "phase": "Unknown", + "containerStatuses": [{"ready": True}, {"ready": False}], + "conditions": [{"status": "True", "type": "Ready"}], + } +} + + +@pytest.mark.parametrize('name, use_journald, logging_driver, extra_words', [ + ( + 'test success with use_journald=false, and docker config set to use "json-file"', + False, + "json-file", + [], + ), +], ids=lambda argvals: argvals[0]) +def test_check_logging_config_non_master(name, use_journald, logging_driver, extra_words): + def execute_module(module_name, args): + if module_name == "docker_info": + return { + "info": { + "LoggingDriver": logging_driver, + } + } + + return {} + + task_vars = dict( + group_names=["nodes", "etcd"], + openshift_logging_fluentd_use_journal=use_journald, + openshift=dict( + common=dict(config_base=""), + ), + ) + + check = FluentdConfig(execute_module, task_vars) + check.execute_module = execute_module + error = check.check_logging_config() + + assert error is None + + +@pytest.mark.parametrize('name, use_journald, logging_driver, words', [ + ( + 'test failure with use_journald=false, but docker config set to use "journald"', + False, + "journald", + ['json log files', 'has been set to use "journald"'], + ), + ( + 'test failure with use_journald=false, but docker config set to use an "unsupported" driver', + False, + "unsupported", + ["json log files", 'has been set to use "unsupported"'], + ), + ( + 'test failure with use_journald=true, but docker config set to use "json-file"', + True, + "json-file", + ['logs from "journald"', 'has been set to use "json-file"'], + ), +], ids=lambda argvals: argvals[0]) +def test_check_logging_config_non_master_failed(name, use_journald, logging_driver, words): + def execute_module(module_name, args): + if module_name == "docker_info": + return { + "info": { + "LoggingDriver": logging_driver, + } + } + + return {} + + task_vars = dict( + group_names=["nodes", "etcd"], + openshift_logging_fluentd_use_journal=use_journald, + openshift=dict( + common=dict(config_base=""), + ), + ) + + check = FluentdConfig(execute_module, task_vars) + check.execute_module = execute_module + error = check.check_logging_config() + + assert error is not None + for word in words: + assert word in error + + +@pytest.mark.parametrize('name, pods, logging_driver, extra_words', [ + # use_journald returns false (not using journald), but check succeeds + # since docker is set to use json-file + ( + 'test success with use_journald=false, and docker config set to use default driver "json-file"', + [canned_fluentd_pod( + [ + { + "name": "container1", + "env": [{ + "name": "USE_JOURNAL", + "value": "false", + }], + }, + ] + )], + "json-file", + [], + ), + ( + 'test success with USE_JOURNAL env var missing and docker config set to use default driver "json-file"', + [canned_fluentd_pod( + [ + { + "name": "container1", + "env": [{ + "name": "RANDOM", + "value": "value", + }], + }, + ] + )], + "json-file", + [], + ), +], ids=lambda argvals: argvals[0]) +def test_check_logging_config_master(name, pods, logging_driver, extra_words): + def execute_module(module_name, args): + if module_name == "docker_info": + return { + "info": { + "LoggingDriver": logging_driver, + } + } + + return {} + + task_vars = dict( + group_names=["masters"], + openshift=dict( + common=dict(config_base=""), + ), + ) + + def get_pods(namespace, logging_component): + return pods, None + + check = FluentdConfig(execute_module, task_vars) + check.execute_module = execute_module + check.get_pods_for_component = get_pods + error = check.check_logging_config() + + assert error is None + + +@pytest.mark.parametrize('name, pods, logging_driver, words', [ + ( + 'test failure with use_journald=false, but docker config set to use "journald"', + [canned_fluentd_pod( + [ + { + "name": "container1", + "env": [{ + "name": "USE_JOURNAL", + "value": "false", + }], + }, + ] + )], + "journald", + ['json log files', 'has been set to use "journald"'], + ), + ( + 'test failure with use_journald=true, but docker config set to use "json-file"', + [fluentd_pod], + "json-file", + ['logs from "journald"', 'has been set to use "json-file"'], + ), + ( + 'test failure with use_journald=false, but docker set to use an "unsupported" driver', + [canned_fluentd_pod( + [ + { + "name": "container1", + "env": [{ + "name": "USE_JOURNAL", + "value": "false", + }], + }, + ] + )], + "unsupported", + ["json log files", 'has been set to use "unsupported"'], + ), + ( + 'test failure with USE_JOURNAL env var missing and docker config set to use "journald"', + [canned_fluentd_pod( + [ + { + "name": "container1", + "env": [{ + "name": "RANDOM", + "value": "value", + }], + }, + ] + )], + "journald", + ["configuration is set to", "json log files"], + ), +], ids=lambda argvals: argvals[0]) +def test_check_logging_config_master_failed(name, pods, logging_driver, words): + def execute_module(module_name, args): + if module_name == "docker_info": + return { + "info": { + "LoggingDriver": logging_driver, + } + } + + return {} + + task_vars = dict( + group_names=["masters"], + openshift=dict( + common=dict(config_base=""), + ), + ) + + def get_pods(namespace, logging_component): + return pods, None + + check = FluentdConfig(execute_module, task_vars) + check.execute_module = execute_module + check.get_pods_for_component = get_pods + error = check.check_logging_config() + + assert error is not None + for word in words: + assert word in error + + +@pytest.mark.parametrize('name, pods, response, logging_driver, extra_words', [ + ( + 'test OpenShiftCheckException with no running containers', + [canned_fluentd_pod([])], + { + "failed": True, + "result": "unexpected", + }, + "json-file", + ['no running containers'], + ), + ( + 'test OpenShiftCheckException one container and no env vars set', + [canned_fluentd_pod( + [ + { + "name": "container1", + "env": [], + }, + ] + )], + { + "failed": True, + "result": "unexpected", + }, + "json-file", + ['no environment variables'], + ), +], ids=lambda argvals: argvals[0]) +def test_check_logging_config_master_fails_on_unscheduled_deployment(name, pods, response, logging_driver, extra_words): + def execute_module(module_name, args): + if module_name == "docker_info": + return { + "info": { + "LoggingDriver": logging_driver, + } + } + + return {} + + task_vars = dict( + group_names=["masters"], + openshift=dict( + common=dict(config_base=""), + ), + ) + + def get_pods(namespace, logging_component): + return pods, None + + check = FluentdConfig(execute_module, task_vars) + check.get_pods_for_component = get_pods + + with pytest.raises(OpenShiftCheckException) as error: + check.check_logging_config() + + assert error is not None + for word in extra_words: + assert word in str(error) diff --git a/roles/openshift_health_checker/test/fluentd_test.py b/roles/openshift_health_checker/test/fluentd_test.py index d151c0b19ff..a84d89cef17 100644 --- a/roles/openshift_health_checker/test/fluentd_test.py +++ b/roles/openshift_health_checker/test/fluentd_test.py @@ -4,14 +4,6 @@ from openshift_checks.logging.fluentd import Fluentd -def canned_fluentd(exec_oc=None): - """Create a Fluentd check object with canned exec_oc method""" - check = Fluentd("dummy") # fails if a module is actually invoked - if exec_oc: - check._exec_oc = exec_oc - return check - - def assert_error(error, expect_error): if expect_error: assert error @@ -103,7 +95,7 @@ def assert_error(error, expect_error): ), ]) def test_get_fluentd_pods(pods, nodes, expect_error): - check = canned_fluentd(lambda cmd, args, task_vars: json.dumps(dict(items=nodes))) - - error = check.check_fluentd(pods, {}) + check = Fluentd() + check.exec_oc = lambda ns, cmd, args: json.dumps(dict(items=nodes)) + error = check.check_fluentd(pods) assert_error(error, expect_error) diff --git a/roles/openshift_health_checker/test/kibana_test.py b/roles/openshift_health_checker/test/kibana_test.py index 40a5d19d870..0bf49251168 100644 --- a/roles/openshift_health_checker/test/kibana_test.py +++ b/roles/openshift_health_checker/test/kibana_test.py @@ -11,14 +11,6 @@ from openshift_checks.logging.kibana import Kibana -def canned_kibana(exec_oc=None): - """Create a Kibana check object with canned exec_oc method""" - check = Kibana("dummy") # fails if a module is actually invoked - if exec_oc: - check._exec_oc = exec_oc - return check - - def assert_error(error, expect_error): if expect_error: assert error @@ -68,7 +60,7 @@ def assert_error(error, expect_error): ), ]) def test_check_kibana(pods, expect_error): - check = canned_kibana() + check = Kibana() error = check.check_kibana(pods) assert_error(error, expect_error) @@ -137,9 +129,10 @@ def test_check_kibana(pods, expect_error): ), ]) def test_get_kibana_url(route, expect_url, expect_error): - check = canned_kibana(lambda cmd, args, task_vars: json.dumps(route) if route else "") + check = Kibana() + check.exec_oc = lambda ns, cmd, args: json.dumps(route) if route else "" - url, error = check._get_kibana_url({}) + url, error = check._get_kibana_url() if expect_url: assert url == expect_url else: @@ -169,10 +162,10 @@ def test_get_kibana_url(route, expect_url, expect_error): ), ]) def test_verify_url_internal_failure(exec_result, expect): - check = Kibana(execute_module=lambda module_name, args, tmp, task_vars: dict(failed=True, msg=exec_result)) - check._get_kibana_url = lambda task_vars: ('url', None) + check = Kibana(execute_module=lambda *_: dict(failed=True, msg=exec_result)) + check._get_kibana_url = lambda: ('url', None) - error = check._check_kibana_route({}) + error = check._check_kibana_route() assert_error(error, expect) @@ -210,9 +203,9 @@ def urlopen(url, context): raise lib_result monkeypatch.setattr(urllib2, 'urlopen', urlopen) - check = canned_kibana() - check._get_kibana_url = lambda task_vars: ('url', None) - check._verify_url_internal = lambda url, task_vars: None + check = Kibana() + check._get_kibana_url = lambda: ('url', None) + check._verify_url_internal = lambda url: None - error = check._check_kibana_route({}) + error = check._check_kibana_route() assert_error(error, expect) diff --git a/roles/openshift_health_checker/test/logging_check_test.py b/roles/openshift_health_checker/test/logging_check_test.py index 128b76b12e2..6f1697ee659 100644 --- a/roles/openshift_health_checker/test/logging_check_test.py +++ b/roles/openshift_health_checker/test/logging_check_test.py @@ -11,7 +11,7 @@ def canned_loggingcheck(exec_oc=None): """Create a LoggingCheck object with canned exec_oc method""" - check = LoggingCheck("dummy") # fails if a module is actually invoked + check = LoggingCheck() # fails if a module is actually invoked check.logging_namespace = 'logging' if exec_oc: check.exec_oc = exec_oc @@ -90,15 +90,15 @@ def assert_error(error, expect_error): ("Permission denied", "Unexpected error using `oc`"), ]) def test_oc_failure(problem, expect): - def execute_module(module_name, args, tmp, task_vars): + def execute_module(module_name, *_): if module_name == "ocutil": return dict(failed=True, result=problem) return dict(changed=False) - check = LoggingCheck({}) + check = LoggingCheck(execute_module, task_vars_config_base) with pytest.raises(OpenShiftCheckException) as excinfo: - check.exec_oc(execute_module, logging_namespace, 'get foo', [], task_vars=task_vars_config_base) + check.exec_oc(logging_namespace, 'get foo', []) assert expect in str(excinfo) @@ -121,14 +121,14 @@ def test_is_active(groups, logging_deployed, is_active): openshift_hosted_logging_deploy=logging_deployed, ) - assert LoggingCheck.is_active(task_vars=task_vars) == is_active + assert LoggingCheck(None, task_vars).is_active() == is_active @pytest.mark.parametrize('pod_output, expect_pods, expect_error', [ ( 'No resources found.', None, - 'There are no pods in the logging namespace', + 'No pods were found for the "es"', ), ( json.dumps({'items': [plain_kibana_pod, plain_es_pod, plain_curator_pod, fluentd_pod_node1]}), @@ -137,12 +137,10 @@ def test_is_active(groups, logging_deployed, is_active): ), ]) def test_get_pods_for_component(pod_output, expect_pods, expect_error): - check = canned_loggingcheck(lambda exec_module, namespace, cmd, args, task_vars: pod_output) + check = canned_loggingcheck(lambda namespace, cmd, args: pod_output) pods, error = check.get_pods_for_component( - lambda name, args, task_vars: {}, logging_namespace, "es", - {} ) assert_error(error, expect_error) diff --git a/roles/openshift_health_checker/test/logging_index_time_test.py b/roles/openshift_health_checker/test/logging_index_time_test.py new file mode 100644 index 00000000000..178d7cd84eb --- /dev/null +++ b/roles/openshift_health_checker/test/logging_index_time_test.py @@ -0,0 +1,170 @@ +import json + +import pytest + +from openshift_checks.logging.logging_index_time import LoggingIndexTime, OpenShiftCheckException + + +SAMPLE_UUID = "unique-test-uuid" + + +def canned_loggingindextime(exec_oc=None): + """Create a check object with a canned exec_oc method""" + check = LoggingIndexTime() # fails if a module is actually invoked + if exec_oc: + check.exec_oc = exec_oc + return check + + +plain_running_elasticsearch_pod = { + "metadata": { + "labels": {"component": "es", "deploymentconfig": "logging-es-data-master"}, + "name": "logging-es-data-master-1", + }, + "status": { + "containerStatuses": [{"ready": True}, {"ready": True}], + "phase": "Running", + } +} +plain_running_kibana_pod = { + "metadata": { + "labels": {"component": "kibana", "deploymentconfig": "logging-kibana"}, + "name": "logging-kibana-1", + }, + "status": { + "containerStatuses": [{"ready": True}, {"ready": True}], + "phase": "Running", + } +} +not_running_kibana_pod = { + "metadata": { + "labels": {"component": "kibana", "deploymentconfig": "logging-kibana"}, + "name": "logging-kibana-2", + }, + "status": { + "containerStatuses": [{"ready": True}, {"ready": False}], + "conditions": [{"status": "True", "type": "Ready"}], + "phase": "pending", + } +} + + +@pytest.mark.parametrize('pods, expect_pods', [ + ( + [not_running_kibana_pod], + [], + ), + ( + [plain_running_kibana_pod], + [plain_running_kibana_pod], + ), + ( + [], + [], + ) +]) +def test_check_running_pods(pods, expect_pods): + check = canned_loggingindextime() + pods = check.running_pods(pods) + assert pods == expect_pods + + +@pytest.mark.parametrize('name, json_response, uuid, timeout, extra_words', [ + ( + 'valid count in response', + { + "count": 1, + }, + SAMPLE_UUID, + 0.001, + [], + ), +], ids=lambda argval: argval[0]) +def test_wait_until_cmd_or_err_succeeds(name, json_response, uuid, timeout, extra_words): + check = canned_loggingindextime(lambda *_: json.dumps(json_response)) + check.wait_until_cmd_or_err(plain_running_elasticsearch_pod, uuid, timeout) + + +@pytest.mark.parametrize('name, json_response, uuid, timeout, extra_words', [ + ( + 'invalid json response', + { + "invalid_field": 1, + }, + SAMPLE_UUID, + 0.001, + ["invalid response", "Elasticsearch"], + ), + ( + 'empty response', + {}, + SAMPLE_UUID, + 0.001, + ["invalid response", "Elasticsearch"], + ), + ( + 'valid response but invalid match count', + { + "count": 0, + }, + SAMPLE_UUID, + 0.005, + ["expecting match", SAMPLE_UUID, "0.005s"], + ) +], ids=lambda argval: argval[0]) +def test_wait_until_cmd_or_err(name, json_response, uuid, timeout, extra_words): + check = canned_loggingindextime(lambda *_: json.dumps(json_response)) + with pytest.raises(OpenShiftCheckException) as error: + check.wait_until_cmd_or_err(plain_running_elasticsearch_pod, uuid, timeout) + + for word in extra_words: + assert word in str(error) + + +@pytest.mark.parametrize('name, json_response, uuid, extra_words', [ + ( + 'correct response code, found unique id is returned', + { + "statusCode": 404, + }, + "sample unique id", + ["sample unique id"], + ), +], ids=lambda argval: argval[0]) +def test_curl_kibana_with_uuid(name, json_response, uuid, extra_words): + check = canned_loggingindextime(lambda *_: json.dumps(json_response)) + check.generate_uuid = lambda: uuid + + result = check.curl_kibana_with_uuid(plain_running_kibana_pod) + + for word in extra_words: + assert word in result + + +@pytest.mark.parametrize('name, json_response, uuid, extra_words', [ + ( + 'invalid json response', + { + "invalid_field": "invalid", + }, + SAMPLE_UUID, + ["invalid response returned", 'Missing "statusCode" key'], + ), + ( + 'wrong error code in response', + { + "statusCode": 500, + }, + SAMPLE_UUID, + ["Expecting error code", "500"], + ), +], ids=lambda argval: argval[0]) +def test_failed_curl_kibana_with_uuid(name, json_response, uuid, extra_words): + check = canned_loggingindextime(lambda *_: json.dumps(json_response)) + check.generate_uuid = lambda: uuid + + with pytest.raises(OpenShiftCheckException) as error: + check.curl_kibana_with_uuid(plain_running_kibana_pod) + + for word in extra_words: + assert word in str(error) diff --git a/roles/openshift_health_checker/test/memory_availability_test.py b/roles/openshift_health_checker/test/memory_availability_test.py index 4fbaea0a97e..aee2f04160b 100644 --- a/roles/openshift_health_checker/test/memory_availability_test.py +++ b/roles/openshift_health_checker/test/memory_availability_test.py @@ -17,7 +17,7 @@ def test_is_active(group_names, is_active): task_vars = dict( group_names=group_names, ) - assert MemoryAvailability.is_active(task_vars=task_vars) == is_active + assert MemoryAvailability(None, task_vars).is_active() == is_active @pytest.mark.parametrize('group_names,configured_min,ansible_memtotal_mb', [ @@ -59,8 +59,7 @@ def test_succeeds_with_recommended_memory(group_names, configured_min, ansible_m ansible_memtotal_mb=ansible_memtotal_mb, ) - check = MemoryAvailability(execute_module=fake_execute_module) - result = check.run(tmp=None, task_vars=task_vars) + result = MemoryAvailability(fake_execute_module, task_vars).run() assert not result.get('failed', False) @@ -117,8 +116,7 @@ def test_fails_with_insufficient_memory(group_names, configured_min, ansible_mem ansible_memtotal_mb=ansible_memtotal_mb, ) - check = MemoryAvailability(execute_module=fake_execute_module) - result = check.run(tmp=None, task_vars=task_vars) + result = MemoryAvailability(fake_execute_module, task_vars).run() assert result.get('failed', False) for word in 'below recommended'.split() + extra_words: diff --git a/roles/openshift_health_checker/test/mixins_test.py b/roles/openshift_health_checker/test/mixins_test.py index 2d83e207d3b..b1a41ca3cfe 100644 --- a/roles/openshift_health_checker/test/mixins_test.py +++ b/roles/openshift_health_checker/test/mixins_test.py @@ -14,10 +14,10 @@ class NotContainerizedCheck(NotContainerizedMixin, OpenShiftCheck): (dict(openshift=dict(common=dict(is_containerized=True))), False), ]) def test_is_active(task_vars, expected): - assert NotContainerizedCheck.is_active(task_vars) == expected + assert NotContainerizedCheck(None, task_vars).is_active() == expected def test_is_active_missing_task_vars(): with pytest.raises(OpenShiftCheckException) as excinfo: - NotContainerizedCheck.is_active(task_vars={}) + NotContainerizedCheck().is_active() assert 'is_containerized' in str(excinfo.value) diff --git a/roles/openshift_health_checker/test/openshift_check_test.py b/roles/openshift_health_checker/test/openshift_check_test.py index e3153979cf6..43aa875f428 100644 --- a/roles/openshift_health_checker/test/openshift_check_test.py +++ b/roles/openshift_health_checker/test/openshift_check_test.py @@ -1,7 +1,7 @@ import pytest from openshift_checks import OpenShiftCheck, OpenShiftCheckException -from openshift_checks import load_checks, get_var +from openshift_checks import load_checks # Fixtures @@ -28,34 +28,23 @@ class TestCheck(OpenShiftCheck): name = "test_check" run = NotImplemented - # initialization requires at least one argument (apart from self) - with pytest.raises(TypeError) as excinfo: - TestCheck() + # execute_module required at init if it will be used + with pytest.raises(RuntimeError) as excinfo: + TestCheck().execute_module("foo") assert 'execute_module' in str(excinfo.value) - assert 'module_executor' in str(excinfo.value) execute_module = object() # initialize with positional argument check = TestCheck(execute_module) - # new recommended name - assert check.execute_module == execute_module - # deprecated attribute name - assert check.module_executor == execute_module + assert check._execute_module == execute_module - # initialize with keyword argument, recommended name + # initialize with keyword argument check = TestCheck(execute_module=execute_module) - # new recommended name - assert check.execute_module == execute_module - # deprecated attribute name - assert check.module_executor == execute_module + assert check._execute_module == execute_module - # initialize with keyword argument, deprecated name - check = TestCheck(module_executor=execute_module) - # new recommended name - assert check.execute_module == execute_module - # deprecated attribute name - assert check.module_executor == execute_module + assert check.task_vars == {} + assert check.tmp is None def test_subclasses(): @@ -81,19 +70,27 @@ def test_load_checks(): assert modules +def dummy_check(task_vars): + class TestCheck(OpenShiftCheck): + name = "dummy" + run = NotImplemented + + return TestCheck(task_vars=task_vars) + + @pytest.mark.parametrize("keys,expected", [ (("foo",), 42), (("bar", "baz"), "openshift"), ]) def test_get_var_ok(task_vars, keys, expected): - assert get_var(task_vars, *keys) == expected + assert dummy_check(task_vars).get_var(*keys) == expected def test_get_var_error(task_vars, missing_keys): with pytest.raises(OpenShiftCheckException): - get_var(task_vars, *missing_keys) + dummy_check(task_vars).get_var(*missing_keys) def test_get_var_default(task_vars, missing_keys): default = object() - assert get_var(task_vars, *missing_keys, default=default) == default + assert dummy_check(task_vars).get_var(*missing_keys, default=default) == default diff --git a/roles/openshift_health_checker/test/ovs_version_test.py b/roles/openshift_health_checker/test/ovs_version_test.py index 6494e1c0627..b6acef5a6ce 100644 --- a/roles/openshift_health_checker/test/ovs_version_test.py +++ b/roles/openshift_health_checker/test/ovs_version_test.py @@ -4,7 +4,7 @@ def test_openshift_version_not_supported(): - def execute_module(module_name=None, module_args=None, tmp=None, task_vars=None): + def execute_module(*_): return {} openshift_release = '111.7.0' @@ -16,15 +16,14 @@ def execute_module(module_name=None, module_args=None, tmp=None, task_vars=None) openshift_deployment_type='origin', ) - check = OvsVersion(execute_module=execute_module) with pytest.raises(OpenShiftCheckException) as excinfo: - check.run(tmp=None, task_vars=task_vars) + OvsVersion(execute_module, task_vars).run() assert "no recommended version of Open vSwitch" in str(excinfo.value) def test_invalid_openshift_release_format(): - def execute_module(module_name=None, module_args=None, tmp=None, task_vars=None): + def execute_module(*_): return {} task_vars = dict( @@ -33,9 +32,8 @@ def execute_module(module_name=None, module_args=None, tmp=None, task_vars=None) openshift_deployment_type='origin', ) - check = OvsVersion(execute_module=execute_module) with pytest.raises(OpenShiftCheckException) as excinfo: - check.run(tmp=None, task_vars=task_vars) + OvsVersion(execute_module, task_vars).run() assert "invalid version" in str(excinfo.value) @@ -54,7 +52,7 @@ def test_ovs_package_version(openshift_release, expected_ovs_version): ) return_value = object() - def execute_module(module_name=None, module_args=None, tmp=None, task_vars=None): + def execute_module(module_name=None, module_args=None, *_): assert module_name == 'rpm_version' assert "package_list" in module_args @@ -64,8 +62,7 @@ def execute_module(module_name=None, module_args=None, tmp=None, task_vars=None) return return_value - check = OvsVersion(execute_module=execute_module) - result = check.run(tmp=None, task_vars=task_vars) + result = OvsVersion(execute_module, task_vars).run() assert result is return_value @@ -86,4 +83,4 @@ def test_ovs_version_skip_when_not_master_nor_node(group_names, is_containerized group_names=group_names, openshift=dict(common=dict(is_containerized=is_containerized)), ) - assert OvsVersion.is_active(task_vars=task_vars) == is_active + assert OvsVersion(None, task_vars).is_active() == is_active diff --git a/roles/openshift_health_checker/test/package_availability_test.py b/roles/openshift_health_checker/test/package_availability_test.py index f7e916a4690..1fe648b7547 100644 --- a/roles/openshift_health_checker/test/package_availability_test.py +++ b/roles/openshift_health_checker/test/package_availability_test.py @@ -14,7 +14,7 @@ def test_is_active(pkg_mgr, is_containerized, is_active): ansible_pkg_mgr=pkg_mgr, openshift=dict(common=dict(is_containerized=is_containerized)), ) - assert PackageAvailability.is_active(task_vars=task_vars) == is_active + assert PackageAvailability(None, task_vars).is_active() == is_active @pytest.mark.parametrize('task_vars,must_have_packages,must_not_have_packages', [ @@ -51,13 +51,12 @@ def test_is_active(pkg_mgr, is_containerized, is_active): def test_package_availability(task_vars, must_have_packages, must_not_have_packages): return_value = object() - def execute_module(module_name=None, module_args=None, tmp=None, task_vars=None): + def execute_module(module_name=None, module_args=None, *_): assert module_name == 'check_yum_update' assert 'packages' in module_args assert set(module_args['packages']).issuperset(must_have_packages) assert not set(module_args['packages']).intersection(must_not_have_packages) return return_value - check = PackageAvailability(execute_module=execute_module) - result = check.run(tmp=None, task_vars=task_vars) + result = PackageAvailability(execute_module, task_vars).run() assert result is return_value diff --git a/roles/openshift_health_checker/test/package_update_test.py b/roles/openshift_health_checker/test/package_update_test.py index 5e000cff57f..06489b0d7a5 100644 --- a/roles/openshift_health_checker/test/package_update_test.py +++ b/roles/openshift_health_checker/test/package_update_test.py @@ -4,13 +4,12 @@ def test_package_update(): return_value = object() - def execute_module(module_name=None, module_args=None, tmp=None, task_vars=None): + def execute_module(module_name=None, module_args=None, *_): assert module_name == 'check_yum_update' assert 'packages' in module_args # empty list of packages means "generic check if 'yum update' will work" assert module_args['packages'] == [] return return_value - check = PackageUpdate(execute_module=execute_module) - result = check.run(tmp=None, task_vars=None) + result = PackageUpdate(execute_module).run() assert result is return_value diff --git a/roles/openshift_health_checker/test/package_version_test.py b/roles/openshift_health_checker/test/package_version_test.py index 1bb6371ae73..6054d3f3eec 100644 --- a/roles/openshift_health_checker/test/package_version_test.py +++ b/roles/openshift_health_checker/test/package_version_test.py @@ -3,61 +3,56 @@ from openshift_checks.package_version import PackageVersion, OpenShiftCheckException -@pytest.mark.parametrize('openshift_release, extra_words', [ - ('111.7.0', ["no recommended version of Open vSwitch"]), - ('0.0.0', ["no recommended version of Docker"]), -]) -def test_openshift_version_not_supported(openshift_release, extra_words): - def execute_module(module_name=None, module_args=None, tmp=None, task_vars=None): - return {} - - task_vars = dict( - openshift=dict(common=dict(service_type='origin')), +def task_vars_for(openshift_release, deployment_type): + return dict( + openshift=dict(common=dict(service_type=deployment_type)), openshift_release=openshift_release, openshift_image_tag='v' + openshift_release, - openshift_deployment_type='origin', + openshift_deployment_type=deployment_type, ) - check = PackageVersion(execute_module=execute_module) + +def test_openshift_version_not_supported(): + check = PackageVersion(None, task_vars_for("1.2.3", 'origin')) + check.get_openshift_version_tuple = lambda: (3, 4, 1) # won't be in the dict + with pytest.raises(OpenShiftCheckException) as excinfo: - check.run(tmp=None, task_vars=task_vars) + check.get_required_ovs_version() + assert "no recommended version of Open vSwitch" in str(excinfo.value) - for word in extra_words: - assert word in str(excinfo.value) + with pytest.raises(OpenShiftCheckException) as excinfo: + check.get_required_docker_version() + assert "no recommended version of Docker" in str(excinfo.value) def test_invalid_openshift_release_format(): - def execute_module(module_name=None, module_args=None, tmp=None, task_vars=None): - return {} - task_vars = dict( openshift=dict(common=dict(service_type='origin')), openshift_image_tag='v0', openshift_deployment_type='origin', ) - check = PackageVersion(execute_module=execute_module) + check = PackageVersion(lambda *_: {}, task_vars) with pytest.raises(OpenShiftCheckException) as excinfo: - check.run(tmp=None, task_vars=task_vars) + check.run() assert "invalid version" in str(excinfo.value) @pytest.mark.parametrize('openshift_release', [ - "3.5", + "111.7.0", + "3.7", "3.6", + "3.5.1.2.3", + "3.5", "3.4", "3.3", + "2.1.0", ]) def test_package_version(openshift_release): - task_vars = dict( - openshift=dict(common=dict(service_type='origin')), - openshift_release=openshift_release, - openshift_image_tag='v' + openshift_release, - openshift_deployment_type='origin', - ) + return_value = object() - def execute_module(module_name=None, module_args=None, tmp=None, task_vars=None): + def execute_module(module_name=None, module_args=None, tmp=None, task_vars=None, *_): assert module_name == 'aos_version' assert "package_list" in module_args @@ -67,29 +62,24 @@ def execute_module(module_name=None, module_args=None, tmp=None, task_vars=None) return return_value - check = PackageVersion(execute_module=execute_module) - result = check.run(tmp=None, task_vars=task_vars) + check = PackageVersion(execute_module, task_vars_for(openshift_release, 'origin')) + result = check.run() assert result is return_value @pytest.mark.parametrize('deployment_type,openshift_release,expected_docker_version', [ ("origin", "3.5", "1.12"), + ("origin", "1.3", "1.10"), + ("origin", "1.1", "1.8"), ("openshift-enterprise", "3.4", "1.12"), - ("origin", "3.3", "1.10"), ("openshift-enterprise", "3.2", "1.10"), - ("origin", "3.1", "1.8"), ("openshift-enterprise", "3.1", "1.8"), ]) def test_docker_package_version(deployment_type, openshift_release, expected_docker_version): - task_vars = dict( - openshift=dict(common=dict(service_type='origin')), - openshift_release=openshift_release, - openshift_image_tag='v' + openshift_release, - openshift_deployment_type=deployment_type, - ) + return_value = object() - def execute_module(module_name=None, module_args=None, tmp=None, task_vars=None): + def execute_module(module_name=None, module_args=None, *_): assert module_name == 'aos_version' assert "package_list" in module_args @@ -99,8 +89,8 @@ def execute_module(module_name=None, module_args=None, tmp=None, task_vars=None) return return_value - check = PackageVersion(execute_module=execute_module) - result = check.run(tmp=None, task_vars=task_vars) + check = PackageVersion(execute_module, task_vars_for(openshift_release, deployment_type)) + result = check.run() assert result is return_value @@ -121,4 +111,4 @@ def test_package_version_skip_when_not_master_nor_node(group_names, is_container group_names=group_names, openshift=dict(common=dict(is_containerized=is_containerized)), ) - assert PackageVersion.is_active(task_vars=task_vars) == is_active + assert PackageVersion(None, task_vars).is_active() == is_active diff --git a/roles/openshift_health_checker/test/search_journalctl_test.py b/roles/openshift_health_checker/test/search_journalctl_test.py new file mode 100644 index 00000000000..724928aa122 --- /dev/null +++ b/roles/openshift_health_checker/test/search_journalctl_test.py @@ -0,0 +1,157 @@ +import pytest +import search_journalctl + + +def canned_search_journalctl(get_log_output=None): + """Create a search_journalctl object with canned get_log_output method""" + module = search_journalctl + if get_log_output: + module.get_log_output = get_log_output + return module + + +DEFAULT_TIMESTAMP = 1496341364 + + +def get_timestamp(modifier=0): + return DEFAULT_TIMESTAMP + modifier + + +def get_timestamp_microseconds(modifier=0): + return get_timestamp(modifier) * 1000000 + + +def create_test_log_object(stamp, msg): + return '{{"__REALTIME_TIMESTAMP": "{}", "MESSAGE": "{}"}}'.format(stamp, msg) + + +@pytest.mark.parametrize('name,matchers,log_input,expected_matches,expected_errors', [ + ( + 'test with valid params', + [ + { + "start_regexp": r"Sample Logs Beginning", + "regexp": r"test log message", + "unit": "test", + }, + ], + [ + create_test_log_object(get_timestamp_microseconds(), "test log message"), + create_test_log_object(get_timestamp_microseconds(), "Sample Logs Beginning"), + ], + ["test log message"], + [], + ), + ( + 'test with invalid json in log input', + [ + { + "start_regexp": r"Sample Logs Beginning", + "regexp": r"test log message", + "unit": "test-unit", + }, + ], + [ + '{__REALTIME_TIMESTAMP: ' + str(get_timestamp_microseconds()) + ', "MESSAGE": "test log message"}', + ], + [], + [ + ["invalid json", "test-unit", "test log message"], + ], + ), + ( + 'test with invalid regexp', + [ + { + "start_regexp": r"Sample Logs Beginning", + "regexp": r"test [ log message", + "unit": "test", + }, + ], + [ + create_test_log_object(get_timestamp_microseconds(), "test log message"), + create_test_log_object(get_timestamp_microseconds(), "sample log message"), + create_test_log_object(get_timestamp_microseconds(), "fake log message"), + create_test_log_object(get_timestamp_microseconds(), "dummy log message"), + create_test_log_object(get_timestamp_microseconds(), "Sample Logs Beginning"), + ], + [], + [ + ["invalid regular expression"], + ], + ), +], ids=lambda argval: argval[0]) +def test_get_log_matches(name, matchers, log_input, expected_matches, expected_errors): + def get_log_output(matcher): + return log_input + + module = canned_search_journalctl(get_log_output) + matched_regexp, errors = module.get_log_matches(matchers, 500, 60 * 60) + + assert set(matched_regexp) == set(expected_matches) + assert len(expected_errors) == len(errors) + + for idx, partial_err_set in enumerate(expected_errors): + for partial_err_msg in partial_err_set: + assert partial_err_msg in errors[idx] + + +@pytest.mark.parametrize('name,matcher,log_count_lim,stamp_lim_seconds,log_input,expected_match', [ + ( + 'test with matching log message, but out of bounds of log_count_lim', + { + "start_regexp": r"Sample Logs Beginning", + "regexp": r"dummy log message", + "unit": "test", + }, + 3, + get_timestamp(-100 * 60 * 60), + [ + create_test_log_object(get_timestamp_microseconds(), "test log message"), + create_test_log_object(get_timestamp_microseconds(), "sample log message"), + create_test_log_object(get_timestamp_microseconds(), "fake log message"), + create_test_log_object(get_timestamp_microseconds(), "dummy log message"), + create_test_log_object(get_timestamp_microseconds(), "Sample Logs Beginning"), + ], + None, + ), + ( + 'test with matching log message, but with timestamp too old', + { + "start_regexp": r"Sample Logs Beginning", + "regexp": r"dummy log message", + "unit": "test", + }, + 100, + get_timestamp(-10), + [ + create_test_log_object(get_timestamp_microseconds(), "test log message"), + create_test_log_object(get_timestamp_microseconds(), "sample log message"), + create_test_log_object(get_timestamp_microseconds(), "fake log message"), + create_test_log_object(get_timestamp_microseconds(-1000), "dummy log message"), + create_test_log_object(get_timestamp_microseconds(-1000), "Sample Logs Beginning"), + ], + None, + ), + ( + 'test with matching log message, and timestamp within time limit', + { + "start_regexp": r"Sample Logs Beginning", + "regexp": r"dummy log message", + "unit": "test", + }, + 100, + get_timestamp(-1010), + [ + create_test_log_object(get_timestamp_microseconds(), "test log message"), + create_test_log_object(get_timestamp_microseconds(), "sample log message"), + create_test_log_object(get_timestamp_microseconds(), "fake log message"), + create_test_log_object(get_timestamp_microseconds(-1000), "dummy log message"), + create_test_log_object(get_timestamp_microseconds(-1000), "Sample Logs Beginning"), + ], + create_test_log_object(get_timestamp_microseconds(-1000), "dummy log message"), + ), +], ids=lambda argval: argval[0]) +def test_find_matches_skips_logs(name, matcher, log_count_lim, stamp_lim_seconds, log_input, expected_match): + match = search_journalctl.find_matches(log_input, matcher, log_count_lim, stamp_lim_seconds) + assert match == expected_match