openshift · adellape · Jun 5, 2017 · May 19, 2017 · mburke5678 · May 31, 2017
diff --git a/admin_guide/diagnostics_tool.adoc b/admin_guide/diagnostics_tool.adoc
@@ -115,3 +115,78 @@ A client with *cluster-admin* access available (for any user, but only the
 current master) should be able to diagnose the status of infrastructure such as
 nodes, registry, and router. In each case, running `oc adm diagnostics` looks
 for the client configuration in its standard location and uses it if available.
+
+[[additional-cluster-health-checks]]
+== Additional Diagnostic Checks via Ansible
+
+// TODO: add link to OCP image once it is available
+
+Some additional diagnostic checks are available through the *openshift-ansible*
+container image. See the image's link:https://github.com/openshift/openshift-ansible/blob/master/README_CONTAINER_IMAGE.md[source repository] for usage information.
+
+The following health checks belong to a diagnostic task meant to be run against
+the Ansible inventory file for a deployed {product-title} cluster. They can
+report common problems for the current {product-title} installation.
+
+[[admin-guide-diagnostics-tool-ansible-checks]]
+.Diagnostic Checks
+[options="header"]
+|===
+
+|Check Name |Purpose
+
+|`ovs_version`
+|This check ensures that a host has the correct version of Open vSwitch installed
+for the currently deployed version of {product-title}.
+
+|`kibana`, `curator`, `elasticsearch`, `fluentd`
+|This set of checks verifies that Elasticsearch, Fluentd, and Curator pods have
+been deployed and are in a `running` state, and that a connection can be
+established between the control host and the exposed Kibana URL. These checks
+will only run if the `openshift_hosted_logging_deploy` inventory variable is set
+to `true`, to ensure that they are executed in a deployment where a logging
+stack has been deployed.
+
+|`etcd_imagedata_size`
+|This check measures the total size of {product-title} image data in an etcd
+cluster. The check fails if the calculated size exceeds a user-defined limit. If
+no limit is specified, this check will fail if the size of image data amounts to
+50% or more of the currently used space in the etcd cluster.
+
+A failure from this check indicates that a significant amount of space in etcd
+is being taken up by {product-title} image data, which can eventually result in
+your etcd cluster crashing.
+
+A user-defined limit may be set by passing the variable
+`etcd_max_image_data_size_bytes=400000000` to the `openshift_health_checker`
+role.
+
+|`etcd_volume`
+|This check ensures that the volume usage for an etcd cluster is below a maximum
+user-specified threshold. If no maximum threshold value is specified, it is
+defaulted to `90%` of the total volume size.
+
+A user-defined limit may be set by passing the variable
+`etcd_device_usage_threshold_percent=90` to the `openshift_health_checker` role.
+
+|`docker_storage`
+|Only runs on hosts that depend on the *docker* damon (nodes and containerized
+installations). Checks that *docker*'s total usage does not exceed a
+user-defined limit. If no user-defined limit is set, *docker*'s maximum usage
+threshold defaults to 90% of the total size available. The threshold
+limit for total percent usage can be set with a variable in your inventory file:
+`max_thinpool_data_usage_percent=90`.
+|===
+
+To disable specific checks, include the variable `openshift_disable_check` with
+a comma-delimited list of check names in your inventory file. For example:
+
+----
+openshift_disable_check=ovs_version,etcd_volume
+----
+
+A similar set of checks meant to run as part of the installation process can be
+found in
+xref:../install_config/install/advanced_install.adoc#configuring-cluster-pre-install-checks[Configuring Cluster Pre-install Checks]. Another set of checks for checking certificate
+expiration can be found in
+xref:../install_config/redeploying_certificates.adoc#install-config-redeploying-certificates[Redeploying Certificates].
diff --git a/install_config/install/advanced_install.adoc b/install_config/install/advanced_install.adoc
@@ -419,6 +419,96 @@ meaning that it is available for placement of new pods. See
 xref:marking-masters-as-unschedulable-nodes[Configuring Schedulability on Masters].
 |===
 
+[[configuring-cluster-pre-install-checks]]
+=== Configuring Cluster Pre-install Checks
+
+Pre-install checks are a set of diagnostic tasks that run as part of the
+*openshift_health_checker* Ansible role. They run prior to an Ansible
+installation of {product-title}, ensure that required inventory values are set,
+and identify potential issues on a host that can prevent or interfere with a
+successful installation.
+
+The following table describes available pre-install checks that will run before
+every Ansible installation of {product-title}:
+
+[[configuring-cluster-pre-install-checks-pre-install-checks]]
+.Pre-install Checks
+[options="header"]
+|===
+
+|Check Name |Purpose
+
+|`memory_availability`
+|This check ensures that a host has the recommended amount of memory for the
+specific deployment of {product-title}. Default values have been derived from
+the
+xref:../../install_config/install/prerequisites.html#system-requirements[latest
+installation documentation]. A user-defined value for minimum memory
+requirements may be set by setting the `openshift_check_min_host_memory_gb`
+cluster variable in your inventory file.
+
+|`disk_availability`
+|This check only runs on etcd, master, and node hosts. It ensures that the mount
+path for an {product-title} installation has sufficient disk space remaining.
+Recommended disk values are taken from the
+xref:../../install_config/install/prerequisites.html#system-requirements[latest
+installation documentation]. A user-defined value for minimum disk space
+requirements may be set by setting `openshift_check_min_host_disk_gb` cluster
+variable in your inventory file.
+
+|`docker_storage`
+|Only runs on hosts that depend on the *docker* daemon (nodes and containerized
+installations). Checks that *docker*'s total usage does not exceed a
+user-defined limit. If no user-defined limit is set, *docker*'s maximum usage
+threshold defaults to 90% of the total size available. The threshold limit for
+total percent usage can be set with a variable in your inventory file:
+`max_thinpool_data_usage_percent=90`. A user-defined limit for maximum thinpool
+usage may be set by setting the `max_thinpool_data_usage_percent` cluster
+variable in your inventory file.
+
+|`docker_storage_driver`
+|Ensures that the *docker* daemon is using a storage driver supported by
+{product-title}. If the
+https://docs.docker.com/engine/userguide/storagedriver/device-mapper-driver[`devicemapper`]
+storage driver is being used, the check additionally ensures that a loopback
+device is not being used.
+
+|`docker_image_availability`
+|Attempts to ensure that images required by an {product-title} installation are
+available either locally or in at least one of the configured container image
+registries on the host machine.
+
+|`package_version`
+|Runs on `yum`-based systems determining if multiple releases of a required
+{product-title} package are available. Having multiple releases of a package
+available during an `enterprise` installation of OpenShift suggests that there
+are multiple `yum` repositories enabled for different releases, which may lead
+to installation problems. This check is skipped if the `openshift_release`
+variable is not defined in the inventory file.
+
+|`package_availability`
+|Runs prior to non-containerized installations of {product-title}. Ensures that
+RPM packages required for the current installation are available.
+
+|`package_update`
+|Checks whether a `yum` update or package installation will succeed, without
+actually performing it or running `yum` on the host.
+|===
+
+To disable specific pre-install checks, include the variable
+`openshift_disable_check` with a comma-delimited list of check names in your
+inventory file. For example:
+
+----
+openshift_disable_check=memory_availability,disk_availability
+----
+
+A similar set of checks meant to run for diagnostic on existing clusters can be
+found in
+xref:../../admin_guide/diagnostics_tool.adoc#additional-cluster-health-checks[Additional Diagnostic Checks via Ansible]. Another set of checks for checking certificate
+expiration can be found in
+xref:../../install_config/redeploying_certificates.adoc#install-config-redeploying-certificates[Redeploying Certificates].
+
 [[advanced-install-configuring-registry-location]]
 === Configuring a Registry Location