-
Notifications
You must be signed in to change notification settings - Fork 21
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
chore: update resources for prometheus, document resource overrides (#…
…713) ## Description Document added for resource/HA overrides across core packages. Also ~doubles Prometheus' limits, but does not adjust the requests. This should ensure that Prometheus still schedules without requiring significant resources, but also allows it to consume more memory without hitting OOM errors. ## Related Issue Related to #551 ## Type of change - [ ] Bug fix (non-breaking change which fixes an issue) - [ ] New feature (non-breaking change which adds functionality) - [x] Other (security config, docs update, etc) ## Checklist before merging - [x] Test, docs, adr added or updated as needed - [x] [Contributor Guide](https://github.com/defenseunicorns/uds-template-capability/blob/main/CONTRIBUTING.md) followed
- Loading branch information
Showing
3 changed files
with
116 additions
and
13 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,114 @@ | ||
--- | ||
title: Resource Configuration and High Availability | ||
type: docs | ||
weight: 3.5 | ||
--- | ||
|
||
Depending on your environment and the scale of your cluster, you might need to adjust UDS Core components for high availability or to optimize resources. Below are common areas where resource overrides can be useful when deploying UDS Core. | ||
|
||
When modifying resources and replica counts it can be useful to observe pod resource metrics in Grafana to make an informed choice on what may be necessary for your environment. Where available HPA ([Horizontal Pod Autoscalers](https://kubernetes.io/docs/tasks/run-application/horizontal-pod-autoscale/)) are beneficial to dynamically scale up/down based on usage. | ||
|
||
## Monitoring | ||
|
||
### Prometheus Stack | ||
|
||
Prometheus is a common place to customize when scaling to larger cluster sizes (more nodes and/or workloads). To scale prometheus beyond a single replica its TSDB must be externalized using one of the [supported options](https://prometheus.io/docs/operating/integrations/#remote-endpoints-and-storage). UDS Core has not yet done extensive testing on this setup. It is also helpful to modify resources for Prometheus using a helm override for the `prometheus.prometheusSpec.resources` value: | ||
|
||
```yaml | ||
packages: | ||
- name: core | ||
repository: oci://ghcr.io/defenseunicorns/packages/uds/core | ||
ref: x.x.x | ||
overrides: | ||
kube-prometheus-stack: | ||
kube-prometheus-stack: | ||
values: | ||
- path: prometheus.prometheusSpec.resources | ||
value: | ||
# Example values only | ||
requests: | ||
cpu: 200m | ||
memory: 1Gi | ||
limits: | ||
cpu: 500m | ||
memory: 4Gi | ||
``` | ||
### Grafana | ||
To scale Grafana for high availability, its database must be externalized (see [Grafana's database configuration docs](https://grafana.com/docs/grafana/latest/setup-grafana/configure-grafana/#database)). UDS Core has not yet done extensive testing on this setup. You can also override the `resources` helm value to customize Grafana pods' resource limits and requests (using the component and chart name of `grafana`). | ||
|
||
## Logging | ||
|
||
### Promtail | ||
|
||
By default Promtail runs as a daemonset, automatically scaling across all nodes to ensure logs are captured from each host. Typically Promtail does not need any other modifications, but you can customize its resource configuration by overriding the `resources` helm value (using the component and chart name of `promtail`). | ||
|
||
### Loki | ||
|
||
By default Loki will deploy in a multi-replica setup. See the below example for modifying replica counts of the read/write/backend pods: | ||
|
||
```yaml | ||
packages: | ||
- name: core | ||
repository: oci://ghcr.io/defenseunicorns/packages/uds/core | ||
ref: x.x.x | ||
overrides: | ||
loki: | ||
loki: | ||
values: | ||
- name: LOKI_WRITE_REPLICAS | ||
path: write.replicas | ||
default: "3" | ||
- name: LOKI_READ_REPLICAS | ||
path: read.replicas | ||
default: "3" | ||
- name: LOKI_BACKEND_REPLICAS | ||
path: backend.replicas | ||
default: "3" | ||
``` | ||
|
||
You will also want to connect Loki to an [external storage provider](https://grafana.com/docs/loki/latest/configure/storage/#chunk-storage) such as AWS S3, which can be done by overriding the `loki.storage` values. | ||
|
||
## Identity & Authorization | ||
|
||
### Keycloak | ||
|
||
Keycloak can be configured in a HA setup if an external database (postgresql) is provided. See the below example values for configuring HA Keycloak: | ||
|
||
```yaml | ||
packages: | ||
- name: core | ||
repository: oci://ghcr.io/defenseunicorns/packages/uds/core | ||
ref: x.x.x | ||
overrides: | ||
keycloak: | ||
keycloak: | ||
values: | ||
- path: devMode | ||
value: false | ||
# Enable HPA to autoscale Keycloak | ||
- path: autoscaling.enabled | ||
value: true | ||
variables: | ||
- name: KEYCLOAK_DB_HOST | ||
path: postgresql.host | ||
- name: KEYCLOAK_DB_USERNAME | ||
path: postgresql.username | ||
- name: KEYCLOAK_DB_DATABASE | ||
path: postgresql.database | ||
- name: KEYCLOAK_DB_PASSWORD | ||
path: postgresql.password | ||
``` | ||
|
||
### AuthService | ||
|
||
AuthService can be configured in a HA setup if an [external session store](https://docs.tetrate.io/istio-authservice/configuration/oidc#session-store-configuration) is provided (key value store like Redis/Valkey). For configuring an external session store you can set the `UDS_AUTHSERVICE_REDIS_URI` env when deploying or via your `uds-config.yaml`: | ||
|
||
```yaml | ||
variables: | ||
core: | ||
AUTHSERVICE_REDIS_URI: redis://redis.redis.svc.cluster.local:6379 | ||
``` | ||
|
||
To scale up replicas or modify resource requests/limits you can use UDS bundle overrides for the helm values of `replicaCount` and `resources` (using the component and chart name of `authservice`). |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters