openshift · juanvallejo · Jun 13, 2017 · jeremyeder · Jun 22, 2017 · juanvallejo
diff --git a/scaling_performance/optimizing_compute_resources.adoc b/scaling_performance/optimizing_compute_resources.adoc
@@ -94,10 +94,10 @@ Registry  credentials.
 [[scaling-performance-debugging]]
 == Debugging {product-title} Using the RHEL Tools Container
 
-Red Hat distributes a *rhel-tools* container, which:
+Red Hat distributes a *rhel-tools* container, containing tools that aid in debugging scaling or performance problems. This container:
 
-* Allow users to deploy minimal footprint container hosts by moving packages out of the base distribution and into this support container.
-* Provide debugging capabilities for Red Hat Enterprise Linux 7 Atomic Host, which has an immutable packet tree. *rhel-tools* includes utilities such as tcpdump, sosreport, git, gdb, perf, and many more common system administration utilities.
+* Allows users to deploy minimal footprint container hosts by moving packages out of the base distribution and into this support container.
+* Provides debugging capabilities for Red Hat Enterprise Linux 7 Atomic Host, which has an immutable packet tree. *rhel-tools* includes utilities such as tcpdump, sosreport, git, gdb, perf, and many more common system administration utilities.
 
 Use the *rhel-tools* container with the following:
 
@@ -107,5 +107,75 @@ Use the *rhel-tools* container with the following:
 
 See the link:https://access.redhat.com/documentation/en/red-hat-enterprise-linux-atomic-host/7/getting-started-with-containers/chapter-11-using-the-atomic-tools-container-image[RHEL Tools Container documentation] for more information.
 
+[[scaling-performance-debugging-using-oa-image]]
+== Debugging {product-title} Using the OpenShift-Ansible Image
+
+Red Hat distributes an https://github.com/openshift/openshift-ansible/blob/master/README_CONTAINER_IMAGE.md[openshift-ansible image], with specific checks focused on detecting common deployment issues.
+Use the following checks to help detect potential issues:
+
+[[diagnostic-checks]]
+.Diagnostic Checks
+[options="header"]
+|===
+
+|Check Name |Purpose
+
+|`*etcd_imagedata_size*`
+|This check measures the total size of OpenShift image data in an etcd cluster.
+Fails if the calculated size exceeds a user-defined limit. If no limit is specified, this check will fail if the size of OpenShift image data exceeds a certain amount of the currently used space in the etcd cluster.
+
+A failure from this check indicates that a significant amount of space in etcd is being taken up by OpenShift image data, which can destabilize an etcd cluster.
+
+A user-defined limit may be set by passing the variable `etcd_max_image_data_size_bytes=40000000000` to the `openshift_health_checker` role.
+This example limit will cause the check to fail if the total size of OpenShift image data stored in etcd exceeds `40GB`.
+
+A user-defined value may be set for this variable by passing it as an option to the role:
+
+`# ansible-playbook -i /etc/ansible/hosts playbooks/common/openshift-checks/check.yml -e etcd_max_image_data_size_bytes=40000000000`
+
+It may also be passed as part of the `OPTS` variable, if running the playbook through the Docker image:
+
+`# docker run ... -e OPTS="-v -e etcd_max_image_data_size_bytes=40000000000"`
+
+See below for a complete example of running checks with the Docker image.
+
+|`*etcd_traffic*`
+|This check detects higher-than-normal traffic on an etcd host. Fails if a `journalctl` log entry with an etcd sync duration warning is found.
+
+For further information on improving etcd performance, see the link:host_practices.adoc[Host Practices documentation].
+
+|`*logging_index_time*`
+|This check detects higher-than-normal time delays between log creation and log aggregation by Elasticsearch in a logging stack deployment.
+Fails if a user-defined timeout is reached before logs are able to be queried through Elasticsearch.
+
+A user-defined timeout may be set by passing the variable `openshift_check_logging_index_timeout_seconds=30` to the `openshift_health_checker` role.
+This example timeout will cause the check to fail if a newly-created Kibana log is not able to be queried via Elasticsearch after `30 seconds`.
+
+A user-defined value may be set for this variable by passing it as an option to the role:
+
+`# ansible-playbook -i /etc/ansible/hosts playbooks/common/openshift-checks/health.yml -e openshift_check_logging_index_timeout_seconds=30`
+
+For further information on additional logging-stack checks, see the link:../admin_guide/diagnostics_tool.adoc#additional-cluster-health-checks[Diagnostics Tool documentation].
+|===
+
+
+Use the *openshift-ansible* diagnostic checks with the following:
+
+----
+# docker run -u `id -u` \
+       -v $HOME/.ssh/id_rsa:/opt/app-root/src/.ssh/id_rsa:Z,ro \
+       -v /etc/ansible/hosts:/tmp/inventory:ro \
+       -e INVENTORY_FILE=/tmp/inventory \
+       -e OPTS="-v" \
+       -e PLAYBOOK_FILE=playbooks/common/openshift-checks/health.yml \
+ifdef::openshift-enterprise[]
+       openshift3/ose-ansible
+endif::[]
+ifdef::openshift-origin[]
+        openshift/origin-ansible
+endif::[]
+----
+
+See the link:../admin_guide/diagnostics_tool.adoc#additional-cluster-health-checks[Diagnostics Tool documentation] for more information on additional checks provided by the *openshift-ansible* image.