Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add mention of openshift-ansible image in Scaling and Performance Guide #4579

Closed
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
76 changes: 73 additions & 3 deletions scaling_performance/optimizing_compute_resources.adoc
Original file line number Diff line number Diff line change
Expand Up @@ -94,10 +94,10 @@ Registry credentials.
[[scaling-performance-debugging]]
== Debugging {product-title} Using the RHEL Tools Container

Red Hat distributes a *rhel-tools* container, which:
Red Hat distributes a *rhel-tools* container, containing tools that aid in debugging scaling or performance problems. This container:

* Allow users to deploy minimal footprint container hosts by moving packages out of the base distribution and into this support container.
* Provide debugging capabilities for Red Hat Enterprise Linux 7 Atomic Host, which has an immutable packet tree. *rhel-tools* includes utilities such as tcpdump, sosreport, git, gdb, perf, and many more common system administration utilities.
* Allows users to deploy minimal footprint container hosts by moving packages out of the base distribution and into this support container.
* Provides debugging capabilities for Red Hat Enterprise Linux 7 Atomic Host, which has an immutable packet tree. *rhel-tools* includes utilities such as tcpdump, sosreport, git, gdb, perf, and many more common system administration utilities.

Use the *rhel-tools* container with the following:

Expand All @@ -107,5 +107,75 @@ Use the *rhel-tools* container with the following:

See the link:https://access.redhat.com/documentation/en/red-hat-enterprise-linux-atomic-host/7/getting-started-with-containers/chapter-11-using-the-atomic-tools-container-image[RHEL Tools Container documentation] for more information.

[[scaling-performance-debugging-using-oa-image]]
== Debugging {product-title} Using the OpenShift-Ansible Image

Red Hat distributes an https://github.com/openshift/openshift-ansible/blob/master/README_CONTAINER_IMAGE.md[openshift-ansible image], with specific checks focused on detecting common deployment issues.
Use the following checks to help detect potential issues:

[[diagnostic-checks]]
.Diagnostic Checks
[options="header"]
|===

|Check Name |Purpose

|`*etcd_imagedata_size*`
|This check measures the total size of OpenShift image data in an etcd cluster.
Fails if the calculated size exceeds a user-defined limit. If no limit is specified, this check will fail if the size of OpenShift image data exceeds a certain amount of the currently used space in the etcd cluster.

A failure from this check indicates that a significant amount of space in etcd is being taken up by OpenShift image data, which can destabilize an etcd cluster.

A user-defined limit may be set by passing the variable `etcd_max_image_data_size_bytes=40000000000` to the `openshift_health_checker` role.
This example limit will cause the check to fail if the total size of OpenShift image data stored in etcd exceeds `40GB`.

A user-defined value may be set for this variable by passing it as an option to the role:

`# ansible-playbook -i /etc/ansible/hosts playbooks/common/openshift-checks/check.yml -e etcd_max_image_data_size_bytes=40000000000`

It may also be passed as part of the `OPTS` variable, if running the playbook through the Docker image:

`# docker run ... -e OPTS="-v -e etcd_max_image_data_size_bytes=40000000000"`

See below for a complete example of running checks with the Docker image.

|`*etcd_traffic*`
|This check detects higher-than-normal traffic on an etcd host. Fails if a `journalctl` log entry with an etcd sync duration warning is found.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@eparis is this happening to us?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You mean do we have message like:

Jun 22 18:11:28 ip-172-31-54-162.ec2.internal etcd[100560]: sync duration of 2.675498017s, expected less than 1s

(which I just got off an active cluster)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is the error message that generally precedes very bad things (tm)


For further information on improving etcd performance, see the link:host_practices.adoc[Host Practices documentation].

|`*logging_index_time*`
|This check detects higher-than-normal time delays between log creation and log aggregation by Elasticsearch in a logging stack deployment.
Fails if a user-defined timeout is reached before logs are able to be queried through Elasticsearch.

A user-defined timeout may be set by passing the variable `openshift_check_logging_index_timeout_seconds=30` to the `openshift_health_checker` role.
This example timeout will cause the check to fail if a newly-created Kibana log is not able to be queried via Elasticsearch after `30 seconds`.

A user-defined value may be set for this variable by passing it as an option to the role:

`# ansible-playbook -i /etc/ansible/hosts playbooks/common/openshift-checks/health.yml -e openshift_check_logging_index_timeout_seconds=30`

For further information on additional logging-stack checks, see the link:../admin_guide/diagnostics_tool.adoc#additional-cluster-health-checks[Diagnostics Tool documentation].
|===


Use the *openshift-ansible* diagnostic checks with the following:

----
# docker run -u `id -u` \
-v $HOME/.ssh/id_rsa:/opt/app-root/src/.ssh/id_rsa:Z,ro \
-v /etc/ansible/hosts:/tmp/inventory:ro \
-e INVENTORY_FILE=/tmp/inventory \
-e OPTS="-v" \
-e PLAYBOOK_FILE=playbooks/common/openshift-checks/health.yml \
ifdef::openshift-enterprise[]
openshift3/ose-ansible
endif::[]
ifdef::openshift-origin[]
openshift/origin-ansible
endif::[]
----

See the link:../admin_guide/diagnostics_tool.adoc#additional-cluster-health-checks[Diagnostics Tool documentation] for more information on additional checks provided by the *openshift-ansible* image.