Skip to content

Commit

Permalink
Fix welcome-to-elastic links (#484)
Browse files Browse the repository at this point in the history
**Problem:** In elastic/docs#2752, we updated the URL prefix (`welcome-to-elastic`) and name for the "Welcome to Elastic Docs" docs. However, we still have some stray links that use the old `/welcome-to-elastic` URL prefix

**Solution:** Updates several outdated links to use an attribute.
  • Loading branch information
jrodewig authored Sep 13, 2023
1 parent e7c5989 commit 01bbc42
Showing 1 changed file with 23 additions and 23 deletions.
Original file line number Diff line number Diff line change
Expand Up @@ -9,12 +9,12 @@ For more information on how to deploy {agent} on {k8s}, please review these page
[discrete]
== Observability at scale

This document summarizes some key factors and best practices for using https://www.elastic.co/guide/en/welcome-to-elastic/current/getting-started-kubernetes.html[Elastic {observability}] to monitor {k8s} infrastructure at scale. Users need to consider different parameters and adjust {stack} accordingly. These elements are affected as the size of {k8s} cluster increases:
This document summarizes some key factors and best practices for using {estc-welcome-current}/getting-started-kubernetes.html[Elastic {observability}] to monitor {k8s} infrastructure at scale. Users need to consider different parameters and adjust {stack} accordingly. These elements are affected as the size of {k8s} cluster increases:

- The amount of metrics being collected from several {k8s} endpoints
- The {agent}'s resources to cope with the high CPU and Memory needs for the internal processing
- The {es} resources needed due to the higher rate of metric ingestion
- The Dashboard's visualizations response times as more data are requested on a given time window
- The Dashboard's visualizations response times as more data are requested on a given time window

The document is divided in two main sections:

Expand All @@ -41,7 +41,7 @@ The {k8s} {observability} is based on https://docs.elastic.co/en/integrations/ku

Controller manager and Scheduler datastreams are being enabled only on the specific node that actually runs based on autodiscovery rules

The default manifest provided deploys {agent} as DaemonSet which results in an {agent} being deployed on every node of the {k8s} cluster.
The default manifest provided deploys {agent} as DaemonSet which results in an {agent} being deployed on every node of the {k8s} cluster.

Additionally, by default one agent is elected as **leader** (for more information visit <<kubernetes_leaderelection-provider>>). The {agent} Pod which holds the leadership lock is responsible for collecting the cluster-wide metrics in addition to its node's metrics.

Expand All @@ -58,7 +58,7 @@ The DaemonSet deployment approach with leader election simplifies the installati
[discrete]
=== Specifying resources and limits in Agent manifests

Resourcing of your Pods and the Scheduling priority (check section <<agent-scheduling,Scheduling priority>>) of them are two topics that might be affected as the {k8s} cluster size increases.
Resourcing of your Pods and the Scheduling priority (check section <<agent-scheduling,Scheduling priority>>) of them are two topics that might be affected as the {k8s} cluster size increases.
The increasing demand of resources might result to under-resource the Elastic Agents of your cluster.

Based on our tests we advise to configure only the `limit` section of the `resources` section in the manifest. In this way the `request`'s settings of the `resources` will fall back to the `limits` specified. The `limits` is the upper bound limit of your microservice process, meaning that can operate in less resources and protect {k8s} to assign bigger usage and protect from possible resource exhaustion.
Expand All @@ -76,11 +76,11 @@ Based on our https://github.com/elastic/elastic-agent/blob/main/docs/elastic-age

Sample Elastic Agent Configurations:
|===
| No of Pods in K8s Cluster | Leader Agent Resources | Rest of Agents
| 1000 | cpu: "1500m", memory: "800Mi" | cpu: "300m", memory: "600Mi"
| 3000 | cpu: "2000m", memory: "1500Mi" | cpu: "400m", memory: "800Mi"
| 5000 | cpu: "3000m", memory: "2500Mi" | cpu: "500m", memory: "900Mi"
| 10000 | cpu: "3000m", memory: "3600Mi" | cpu: "700m", memory: "1000Mi"
| No of Pods in K8s Cluster | Leader Agent Resources | Rest of Agents
| 1000 | cpu: "1500m", memory: "800Mi" | cpu: "300m", memory: "600Mi"
| 3000 | cpu: "2000m", memory: "1500Mi" | cpu: "400m", memory: "800Mi"
| 5000 | cpu: "3000m", memory: "2500Mi" | cpu: "500m", memory: "900Mi"
| 10000 | cpu: "3000m", memory: "3600Mi" | cpu: "700m", memory: "1000Mi"
|===

> The above tests were performed with {agent} version 8.7 and scraping period of `10sec` (period setting for the {k8s} integration). Those numbers are just indicators and should be validated for each different {k8s} environment and amount of workloads.
Expand All @@ -94,19 +94,19 @@ Although daemonset installation is simple, it can not accommodate the varying ag

- A dedicated {agent} deployment of a single Agent for collecting cluster wide metrics from the apiserver

- Node level {agent}s(no leader Agent) in a Daemonset
- Node level {agent}s(no leader Agent) in a Daemonset

- kube-state-metrics shards and {agent}s in the StatefulSet defined in the kube-state-metrics autosharding manifest

Each of these groups of {agent}s will have its own policy specific to its function and can be resourced independently in the appropriate manifest to accommodate its specific resource requirements.

Resource assignment led us to alternatives installation methods.
Resource assignment led us to alternatives installation methods.

IMPORTANT: The main suggestion for big scale clusters *is to install {agent} as side container along with `kube-state-metrics` Shard*. The installation is explained in details https://github.com/elastic/elastic-agent/tree/main/deploy/kubernetes#kube-state-metrics-ksm-in-autosharding-configuration[{agent} with Kustomize in Autosharding]

The following **alternative configuration methods** have been verified:

1. With `hostNetwork:false`
1. With `hostNetwork:false`
- {agent} as Side Container within KSM Shard pod
- For non-leader {agent} deployments that collect per KSM shards
2. With `taint/tolerations` to isolate the {agent} daemonset pods from rest of deployments
Expand All @@ -116,10 +116,10 @@ You can find more information in the document called https://github.com/elastic/
Based on our https://github.com/elastic/elastic-agent/blob/main/docs/elastic-agent-scaling-tests.md[{agent} scaling tests], the following table aims to assist users on how to configure their KSM Sharding as {k8s} cluster scales:
|===
| No of Pods in K8s Cluster | No of KSM Shards | Agent Resources
| 1000 | No Sharding can be handled with default KSM config | limits: memory: 700Mi , cpu:500m
| 3000 | 4 Shards | limits: memory: 1400Mi , cpu:1500m
| 5000 | 6 Shards | limits: memory: 1400Mi , cpu:1500m
| 10000 | 8 Shards | limits: memory: 1400Mi , cpu:1500m
| 1000 | No Sharding can be handled with default KSM config | limits: memory: 700Mi , cpu:500m
| 3000 | 4 Shards | limits: memory: 1400Mi , cpu:1500m
| 5000 | 6 Shards | limits: memory: 1400Mi , cpu:1500m
| 10000 | 8 Shards | limits: memory: 1400Mi , cpu:1500m
|===

> The tests above were performed with {agent} version 8.8 + TSDB Enabled and scraping period of `10sec` (for the {k8s} integration). Those numbers are just indicators and should be validated per different {k8s} policy configuration, along with applications that the {k8s} cluster might include
Expand Down Expand Up @@ -164,7 +164,7 @@ Setting this to `false` is recommended for large scale setups where the `host.ip
[discrete]
=== Elastic Stack Configuration

The configuration of Elastic Stack needs to be taken under consideration in large scale deployments. In case of Elastic Cloud deployments the choice of the deployment https://www.elastic.co/guide/en/cloud/current/ec-getting-started-profiles.html[{ecloud} hardware profile] is important.
The configuration of Elastic Stack needs to be taken under consideration in large scale deployments. In case of Elastic Cloud deployments the choice of the deployment https://www.elastic.co/guide/en/cloud/current/ec-getting-started-profiles.html[{ecloud} hardware profile] is important.

For heavy processing and big ingestion rate needs, the `CPU-optimised` profile is proposed.

Expand All @@ -173,7 +173,7 @@ For heavy processing and big ingestion rate needs, the `CPU-optimised` profile i
== Validation and Troubleshooting practices

[discrete]
=== Define if Agents are collecting as expected
=== Define if Agents are collecting as expected

After {agent} deployment, we need to verify that Agent services are healthy, not restarting (stability) and that collection of metrics continues with expected rate (latency).

Expand Down Expand Up @@ -229,7 +229,7 @@ Components:
Healthy: communicating with pid '42462'
------------------------------------------------

It is a common problem of lack of CPU/memory resources that agent process restart as {k8s} size grows. In the logs of agent you
It is a common problem of lack of CPU/memory resources that agent process restart as {k8s} size grows. In the logs of agent you

[source,json]
------------------------------------------------
Expand All @@ -241,7 +241,7 @@ kubectl logs -n kube-system elastic-agent-qw6f4 | grep "kubernetes/metrics"
------------------------------------------------

You can verify the instant resource consumption by running `top pod` command and identify if agents are close to the limits you have specified in your manifest.
You can verify the instant resource consumption by running `top pod` command and identify if agents are close to the limits you have specified in your manifest.

[source,bash]
------------------------------------------------
Expand Down Expand Up @@ -273,7 +273,7 @@ Identify how many events have been sent to {es}:

[source,bash]
------------------------------------------------
kubectl logs -n kube-system elastic-agent-h24hh -f | grep -i state_pod
kubectl logs -n kube-system elastic-agent-h24hh -f | grep -i state_pod
[ouptut truncated ...]
"state_pod":{"events":2936,"success":2936}
Expand All @@ -294,5 +294,5 @@ Corresponding dashboards for `CPU Usage`, `Index Response Times` and `Memory Pre

== Relevant links

- https://www.elastic.co/guide/en/welcome-to-elastic/current/getting-started-kubernetes.html[Monitor {k8s} Infrastructure]
- {estc-welcome-current}/getting-started-kubernetes.html[Monitor {k8s} Infrastructure]
- https://www.elastic.co/blog/kubernetes-cluster-metrics-logs-monitoring[Blog: Managing your {k8s} cluster with Elastic {observability}]

0 comments on commit 01bbc42

Please sign in to comment.