Implement alloy-service resource to configure Alloy as a monitoring agent #66

TheoBrigitte · 2024-08-09T14:17:36Z

Towards: giantswarm/roadmap#3522

This PR implements the logic to configure Alloy as a monitoring agent instead of Prometheus agent.

It does

add --monitoring-agent flag to choose between Prometheus agent and Alloy
add logic to select monitoring agent and check for Alloy support in observability bundle version
add Alloy service resource; which created a configmap for alloy app and a secret to be mounted by pods directly to inject values via env var
add Alloy config files as templates
- Alloy app helm values; with networkpolicy, clustering enabled, autoscaling enabled, affinity settings similar to Prometheus agent
- Alloy config; with podmonitor and servicemonitor discovery, remotewrite setting
add common labels used by both configmap and secret
disable golang-ci lll linter (which keep complaining about long lines)

…gent * disable golangci-lint lll

TheoBrigitte · 2024-08-13T09:32:42Z

Clustering works ✔️

…ive_series

TheoBrigitte · 2024-08-19T12:04:24Z

I experimented with Alloy horizontal autoscaler built into the helm chart. The autoscaler is based on memory usage and does by default scale up whenever 80% of memory usage is reached.
Autoscaling based on memory do make sense as there is a relation between the amount of time series and the memory usage: https://grafana.com/docs/alloy/latest/introduction/estimate-resource-usage/
On our side on biggest installation we see Prometheus agent using up to 10 shards and ~10GiB memory per shard. But using 10GiB. When using autoscaling the request memory setting is required, but using 10GiB would not work on installations with smaller nodes.
An idea would be to use 3GiB memory request and 300% hpa memory utilization, which would result hpa scaling up Alloy whenever it reaches 9GiB memory usage, but this still leaves cases with smaller node unresolved as pods would never be able to reach such memory usage.

Therefore I went with our current custom implementation of autoscaling based on number of metrics.

This reverts commit 6ce5744.

This reverts commit ebdaa46.

This reverts commit 7556849.

TheoBrigitte · 2024-08-19T16:42:17Z

Tested and working on both management and workload clusters.

TheoBrigitte · 2024-08-20T13:18:13Z

This was split and merged as part of other PRs:

TheoBrigitte added 6 commits August 6, 2024 19:04

add support for Alloy as monitoring agent

f85eaa7

nolint long line

614c9bc

move monitoring constant to common/monitoring package

d814f7c

use commonmonitoring in pkg/bundle

2d4cedf

move some monitoring constant and function to common/monitoring package

2e03eda

remove type from constants

1bec216

TheoBrigitte self-assigned this Aug 9, 2024

TheoBrigitte added 11 commits August 12, 2024 12:34

Merge remote-tracking branch 'origin/main' into monitoring-common

de5a6c3

Implement alloy-service resource to configure Alloy as a monitoring a…

f50c770

…gent * disable golangci-lint lll

go mod tidy

3b0d744

disable golang-ci lll linter

0b31ff6

remote nolint:lll

ad221b4

wire and rename Alloy service

168b42f

use common/monitoring

30afcbf

add monitoring-agent flag in helm chart

c57c5f2

rename configmap and secret

5e76463

check that observability-bundle supports alloy as metrics agent

f1474bb

add pkg/common/monitoring/observability-bundle.go

a3724b3

TheoBrigitte force-pushed the monitoring-alloy branch from cdf6c99 to a3724b3 Compare August 12, 2024 10:36

TheoBrigitte added 5 commits August 12, 2024 13:09

add PriorityClassName

6f6570d

differentiate app config and direct secret

d1b8f16

fix alloy-monitoring-secret namespace

a98b9e1

replace fixed replicas=2 with autoscaling

247b1f6

update network policy label matcher

67813a3

Base automatically changed from monitoring-common to main August 13, 2024 08:11

TheoBrigitte added 3 commits August 13, 2024 10:40

add required resources.requests for autoscaling

a9c92ad

Merge remote-tracking branch 'origin/main' into monitoring-alloy

8729628

fix resources.requests indentation

7bc1e80

bump memory request to 1024Mi

9dd5c27

TheoBrigitte added 15 commits August 13, 2024 16:06

fix monitoring-secret namespace

f45d743

fix monitoring-secret format

b64b4be

fix alloy config url indentation

dcdc66b

use 3072Mi as memory.request

2ba0582

bump memory to 8GiB and maxReplicas to 20

bf8fd82

fix AutoscalingMaxReplicas

bc8dcd7

use same sharding logic in Alloy service

a7a3f00

fix currentState pointer

7b3faa0

add monitoring-config.go

093ea39

debug shards

5ee7d42

Alloy query head series using prometheus_remote_write_wal_storage_act…

09ca29d

…ive_series

fix MonitoringConfig

00f8dff

move monitoring agent selection up into the reconciler

451499e

cosmectic changes

cc15b0c

use extraEnv

7556849

TheoBrigitte added 6 commits August 19, 2024 14:38

fix envFrom

ebdaa46

do not base64 encode env vars

6ce5744

Revert "do not base64 encode env vars"

0fc911f

This reverts commit 6ce5744.

Revert "fix envFrom"

a4072f0

This reverts commit ebdaa46.

Revert "use extraEnv"

590c950

This reverts commit 7556849.

update secret

ca123fa

TheoBrigitte added 4 commits August 19, 2024 18:43

Merge remote-tracking branch 'origin/main' into monitoring-alloy

a887881

handle invalid monitoring-config data

db704f1

use logger.Info

d747db4

remove unused app.Namespace

ddc5654

TheoBrigitte closed this Aug 20, 2024

TheoBrigitte mentioned this pull request Aug 20, 2024

Alloy Test and Investigation for Metrics giantswarm/roadmap#3522

Closed

4 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implement alloy-service resource to configure Alloy as a monitoring agent #66

Implement alloy-service resource to configure Alloy as a monitoring agent #66

TheoBrigitte commented Aug 9, 2024 •

edited

Loading

TheoBrigitte commented Aug 13, 2024

TheoBrigitte commented Aug 19, 2024 •

edited

Loading

TheoBrigitte commented Aug 19, 2024

TheoBrigitte commented Aug 20, 2024

Implement alloy-service resource to configure Alloy as a monitoring agent #66

Implement alloy-service resource to configure Alloy as a monitoring agent #66

Conversation

TheoBrigitte commented Aug 9, 2024 • edited Loading

TheoBrigitte commented Aug 13, 2024

TheoBrigitte commented Aug 19, 2024 • edited Loading

TheoBrigitte commented Aug 19, 2024

TheoBrigitte commented Aug 20, 2024

TheoBrigitte commented Aug 9, 2024 •

edited

Loading

TheoBrigitte commented Aug 19, 2024 •

edited

Loading