Skip to content

Commit

Permalink
chore: update resources for prometheus, document resource overrides (#…
Browse files Browse the repository at this point in the history
…713)

## Description

Document added for resource/HA overrides across core packages.

Also ~doubles Prometheus' limits, but does not adjust the requests. This
should ensure that Prometheus still schedules without requiring
significant resources, but also allows it to consume more memory without
hitting OOM errors.

## Related Issue

Related to #551

## Type of change

- [ ] Bug fix (non-breaking change which fixes an issue)
- [ ] New feature (non-breaking change which adds functionality)
- [x] Other (security config, docs update, etc)

## Checklist before merging

- [x] Test, docs, adr added or updated as needed
- [x] [Contributor
Guide](https://github.com/defenseunicorns/uds-template-capability/blob/main/CONTRIBUTING.md)
followed
  • Loading branch information
mjnagel authored Aug 29, 2024
1 parent 53f1bfd commit e80c1a4
Show file tree
Hide file tree
Showing 3 changed files with 116 additions and 13 deletions.
114 changes: 114 additions & 0 deletions docs/configuration/resource-configuration-and-ha.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,114 @@
---
title: Resource Configuration and High Availability
type: docs
weight: 3.5
---

Depending on your environment and the scale of your cluster, you might need to adjust UDS Core components for high availability or to optimize resources. Below are common areas where resource overrides can be useful when deploying UDS Core.

When modifying resources and replica counts it can be useful to observe pod resource metrics in Grafana to make an informed choice on what may be necessary for your environment. Where available HPA ([Horizontal Pod Autoscalers](https://kubernetes.io/docs/tasks/run-application/horizontal-pod-autoscale/)) are beneficial to dynamically scale up/down based on usage.

## Monitoring

### Prometheus Stack

Prometheus is a common place to customize when scaling to larger cluster sizes (more nodes and/or workloads). To scale prometheus beyond a single replica its TSDB must be externalized using one of the [supported options](https://prometheus.io/docs/operating/integrations/#remote-endpoints-and-storage). UDS Core has not yet done extensive testing on this setup. It is also helpful to modify resources for Prometheus using a helm override for the `prometheus.prometheusSpec.resources` value:

```yaml
packages:
- name: core
repository: oci://ghcr.io/defenseunicorns/packages/uds/core
ref: x.x.x
overrides:
kube-prometheus-stack:
kube-prometheus-stack:
values:
- path: prometheus.prometheusSpec.resources
value:
# Example values only
requests:
cpu: 200m
memory: 1Gi
limits:
cpu: 500m
memory: 4Gi
```
### Grafana
To scale Grafana for high availability, its database must be externalized (see [Grafana's database configuration docs](https://grafana.com/docs/grafana/latest/setup-grafana/configure-grafana/#database)). UDS Core has not yet done extensive testing on this setup. You can also override the `resources` helm value to customize Grafana pods' resource limits and requests (using the component and chart name of `grafana`).

## Logging

### Promtail

By default Promtail runs as a daemonset, automatically scaling across all nodes to ensure logs are captured from each host. Typically Promtail does not need any other modifications, but you can customize its resource configuration by overriding the `resources` helm value (using the component and chart name of `promtail`).

### Loki

By default Loki will deploy in a multi-replica setup. See the below example for modifying replica counts of the read/write/backend pods:

```yaml
packages:
- name: core
repository: oci://ghcr.io/defenseunicorns/packages/uds/core
ref: x.x.x
overrides:
loki:
loki:
values:
- name: LOKI_WRITE_REPLICAS
path: write.replicas
default: "3"
- name: LOKI_READ_REPLICAS
path: read.replicas
default: "3"
- name: LOKI_BACKEND_REPLICAS
path: backend.replicas
default: "3"
```

You will also want to connect Loki to an [external storage provider](https://grafana.com/docs/loki/latest/configure/storage/#chunk-storage) such as AWS S3, which can be done by overriding the `loki.storage` values.

## Identity & Authorization

### Keycloak

Keycloak can be configured in a HA setup if an external database (postgresql) is provided. See the below example values for configuring HA Keycloak:

```yaml
packages:
- name: core
repository: oci://ghcr.io/defenseunicorns/packages/uds/core
ref: x.x.x
overrides:
keycloak:
keycloak:
values:
- path: devMode
value: false
# Enable HPA to autoscale Keycloak
- path: autoscaling.enabled
value: true
variables:
- name: KEYCLOAK_DB_HOST
path: postgresql.host
- name: KEYCLOAK_DB_USERNAME
path: postgresql.username
- name: KEYCLOAK_DB_DATABASE
path: postgresql.database
- name: KEYCLOAK_DB_PASSWORD
path: postgresql.password
```

### AuthService

AuthService can be configured in a HA setup if an [external session store](https://docs.tetrate.io/istio-authservice/configuration/oidc#session-store-configuration) is provided (key value store like Redis/Valkey). For configuring an external session store you can set the `UDS_AUTHSERVICE_REDIS_URI` env when deploying or via your `uds-config.yaml`:

```yaml
variables:
core:
AUTHSERVICE_REDIS_URI: redis://redis.redis.svc.cluster.local:6379
```

To scale up replicas or modify resource requests/limits you can use UDS bundle overrides for the helm values of `replicaCount` and `resources` (using the component and chart name of `authservice`).
11 changes: 0 additions & 11 deletions docs/configuration/uds-operator.md
Original file line number Diff line number Diff line change
Expand Up @@ -152,17 +152,6 @@ The UDS Operator uses the first `redirectUris` to populate the `match.prefix` ho

For a complete example, see [app-authservice-tenant.yaml](https://github.com/defenseunicorns/uds-core/blob/main/src/test/app-authservice-tenant.yaml)

#### External Session Store
If you wish to scale Authservice horiztonally, Authservice supports using an [external redis session store](https://docs.tetrate.io/istio-authservice/configuration/oidc#session-store-configuration) which can be configured by setting [UDS_AUTHSERVICE_REDIS_URI](https://github.com/defenseunicorns/uds-core/blob/main/src/pepr/zarf.yaml#L20-L22).

You can also specify the `AUTHSERVICE_REDIS_URI` variable in your `uds-config.yaml`:

```yaml
variables:
core:
AUTHSERVICE_REDIS_URI: redis://redis.redis.svc.cluster.local:6379
```

#### Trusted Certificate Authority

Authservice can be configured with additional trusted certificate bundle in cases where UDS Core ingress gateways are deployed with private PKI.
Expand Down
4 changes: 2 additions & 2 deletions src/prometheus-stack/values/values.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -47,8 +47,8 @@ prometheus:
probeSelectorNilUsesHelmValues: false
resources:
limits:
cpu: 300m
memory: 2Gi
cpu: 500m
memory: 4Gi
requests:
cpu: 100m
memory: 512Mi
Expand Down

0 comments on commit e80c1a4

Please sign in to comment.